SI 413 Fall 2021 / Labs


This is the archived website of SI 413 from the Fall 2021 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

Lab 7: Abstract Syntax Trees

This lab is due at 2359 on Wednesday, 20 October. It should be submitted to the course SI413 as Lab07, with the files named properly according to the lab instructions. See the submit page for details. You are highly encouraged to submit whatever you have done by the end of this lab time. Remember that you must thoroughly test your code; the tests that are visible before the deadline are only there to make sure you are using the right names and input/output formats.

1 Starter Code

This lab starts a new group of labs. This means two things. First, you need to choose a new lab partner and make sure to notate that on the shared Google sheet.

Second, you need to create a new repo for your shared work running the following command. (One lab partner should run this command first to create the shared repo, and then the other lab partner should run it next to clone the repo.)

roche@ubuntu$ 413repo spl

If that command doesn't work, check the initial setup instructions.

You should now have a folder like si413/spl/lab07 with the starter code for this lab and a handy script to submit your code.

You can also browse the starter code from the Gitlab page (USNA intranet only).

2 Introduction

From today's lab all the way until the end of the semester, we will be working on interpreting and compiling a made-up programming language called SPL ("Simple Programming Language"). Here's how it'll happen:

  • Lab 7 (today!): Interpreting an Abstract Syntax Tree.
    Gaining familiarity with OOP in C++ and the AST structure allows us to interpret all language features except functions, using a single global scope.
  • Lab 8: Functions and Lexical scoping with frames.
    We will implement the one feature missing from today's lab - functions! This requires us to implement lexical scoping using activation frames. That allows our language to support advanced functional programming features using closures, like we saw in Scheme.
  • Lab 9: Type checking.
    We will add both dynamic and static type-checking features to our interpreter to support richer feedback and reliability for the SPL programmer.
  • Lab 10: Garbage collection.
    Automatic garbage collection will be added to our interpreter using the mark-and-sweep paradigm, to prevent memory leaks and make our interpreter more robust.
  • Labs 11 and 12: Compiler.
    Finally, in this exciting double lab, we will foray briefly into actual compilation by producing LLVM intermediate expression code, the same thing that is used by the popular clang/clang++ compilers to compile C and C++ programs. As a result, you will have a complete working compile that turns a .spl file in our made-up language into an actual executable.

3 OOP in C++

We will be using object-oriented programming in C++ to write the SPL interpreter. Luckily, most of the class structure is already provided by me, and you just have to fill in some of the methods. But here's a summary of how OOP works syntactically in C++. (You should already understand how it works semantically from Java!) All of the examples below are taken from the ast.hpp file from today's starter code.

  • Classes are declared using the class keyword. Superclasses are specified using a colon (like the extends keyword in Java). For example
    class Stmt :public AST {
      /* fields and methods in here */
    };
    declares a class called Stmt that is a subclass of AST. The public keyword is necessary and tells us that the methods in AST can be accessed from a Stmt object by anybody.
    And don't forget that class declarations must end with a semicolon in C++!
  • Fields and methods in a class can be either private, protected, or public. We specify these by grouping declarations with the same visibility and then writing visibility specifiers before them, like public:.
  • If we want to use polymorphism in a class method, we declare the method with the virtual keyword, like
    virtual void exec() {...}
    This allows sub-classes to override that method in the way that we expect from Java. (In Java, essentially every method is declared virtual.)
  • There is no special syntax for interfaces, nor is there an abstract keyword as in Java. Instead, to declare a method abstract (i.e., it must be overridden by subclasses), we write = 0 instead of the method body, like
    virtual void ASTchild(AST* child) = 0;
  • When a virtual method is overridden in a base class, the overrides keyword can be used to tall the compiler that you really intend to be overriding a base class method, like:
    void exec() override { ... }
    The override keyword is actually optional, but it's really useful because you'll get an error message if there is no matching base class method (for example, if you get the signature wrong by accident).
  • Constructors work similarly to Java: a constructor is a special method with no return type, and whose name matches the name of the class. Something different in C++ is the presence of destructors, which are special methods with no return type, no parameters, and whose name is a tilde character `~` followed by the name of the class. Destructors in C++ are responsible to "clean up" when an object is being destroyed. (You shouldn't have to change any descructors for this lab, but it's useful to be aware of what these things are.)
    Destructors are almost always declared virtual, to make sure they get called on subclasses too.
    Here is an example of a class with a no-argument constructor as well as a destructor.
    class AST {
      // ... various other stuff ...
      public:
        /* Makes a new "empty" AST node. */
        AST() { nodeLabel = "EMPTY"; }
    
        /* Deallocates memory for this AST note and its children. */
        virtual ~AST() {
          for (AST* child : children) delete child;
          children.clear();
        }
    };
  • Methods can be defined within the class declaration, like in Java. But if we did this all the time, especially when there are many classes defined in the same file, the file gets pretty cluttered and hard to read.
    So instead of defining methods within the class declaration, we can just declare the method's prototype there, and fully define the method in some .cpp file later.
    For instance, the following class definition appears in ast.hpp:
    class ArithOp :public Exp {
      public:
        Value eval() override;
    
      // ... other fields and methods ...
    };
    Notice that there is no function body for the eval method. That is saved for the ast.cpp file:
    Value ArithOp::eval() {
      int l = left->eval().num();
      int r = right->eval().num();
      switch(op) {
        case ADD: return l + r;
        case SUB: return l - r;
        case MUL: return l * r;
        case DIV:
          if (r != 0) return l / r;
          else if (!error) {
            error = true;
            errout << "ERROR: Divide by zero" << endl;
          }
          return Value();
        default:  return Value(); // shouldn't get here...
      }
    }
    Since this is outside the class declaration, we must specify the name of the class in front of the method name and use the scope resolution operator :: like ArithOp::eval.
    There is no set rule about when a method should be declared in the cpp file versus the hpp file, but it's probably a good idea once the method body gets longer than 4 lines or so. Another good rule of thumb is to try and keep the class declaration (in the hpp file) on one "screenfull", which will be roughly 50 lines depending on your font and editor.

4 The SPL language

SPL is an extension of the calculator language we have been using to include a few control structures as well as simple calculator instructions. The syntax is mostly similar to C++ and Java but much more limited, and there are some very important differences (as we will see). Specifically, SPL supports:

  • Integers like 1234 and booleans true and false
  • Basic arithmetic (+, -, *, /)
  • Numeric comparisons (=,!=, >, >=, <, <=)
  • Boolean operations (and, or, not)
  • Input/output (read and write)
  • Basic control structures (if, ifelse, and while)
  • Variables, declared with the new keyword and assigned using the := operator.
  • User-defined functions. These must be unary (single-argument), and are defined with lambda statements.
    NOTE: We will deal with functions in all their glory in later labs, but for this lab you don't have to worry about lambdas or function calls.

A scanner and parser for SPL have already been written for you; they are in the files spl.lpp and spl.ypp. Check them out for the full details of the SPL syntax. Here's an overview of the grammar:

res: stmt
|

block: LC stmtlist RC

stmtlist: stmtlist stmt
|

stmt: NEW ID ASN exp STOP
|     ID ASN exp STOP
|     WRITE exp STOP
|     IF exp block
|     IFELSE exp block block
|     WHILE exp block
|     block

exp: exp BOP exp
|    NOTTOK exp
|    exp COMP exp
|    exp OPA exp
|    exp OPM exp
|    OPA exp
|    READ
|    LAMBDA ID block
|    exp FUNARG exp
|    LP exp RP
|    ID
|    NUM
|    BOOL

You can see the token names in the grammar above. First we have the standard punctuation: LC and RC are curly braces, LP and RP are parentheses, and STOP is a semicolon. ID is a variable name (which can contain letters and digits), NUM is an series of digits, and BOOL is either true or false.

The operators of the language are defined by these tokens:

Token name | Operator(s)
-----------|------------
       ASN | :=
       BOP | and or
    NOTTOK | not
      COMP | = != < <= > >=
       OPA | + -
       OPM | * /
    FUNARG | @

Finally the keywords in SPL are:

new write if ifelse while read lambda true false

Here are a few interesting aspects of SPL that you should be aware of:

  • There are no declared types. Variables are declared and given an initial value with a statement like
    new x := 10;
    and afterwards, they can be reassigned using just the := operator without the new keyword.
  • The conditions in if and while statements do not need to be enclosed in parentheses. For example, here is a program to find (and then print) the sum of 1 up to 100:
    new i := 1;
    new sum := 0;
    while i <= 100 {
      sum := sum + i;
      i := i + 1;
    }
    write sum;
  • A regular if statement (with no else) looks like you might expect:
    if 1 < 10 { write true; }
    But if we have an if-else, you use the keyword ifelse instead of if, followed by the condition and the two code blocks - no separate else keyword is used:
    ifelse false { write 100; } { write -100; }
  • Function calls are a bit different than the norm. We won't really have to deal with these until next week, but here's a primer. Every function is a unary function (takes one argument), and return values are specified by assigning to the special variable ret. So here is a function to compute the factorial of its argument:
    # This is a one-argument function to compute factorials.
    new fact := lambda n {
      new prod := 1;
      new i := 1;
      while i <= n {
        prod := prod * i;
        i := i + 1;
      }
      ret := prod;
    };
    (Did I mention? Pound signs # start single-line comments.)
    Once we have a function defined, you call it with the @ operator, like function @ operand. Using the function just defined, we could print out 8 factorial with:
    write fact@8;

Exercises

  1. Write an SPL program (without any lambdas) that reads in integers until 0 is entered, at which time the sum of all the integers read in is printed to the screen.
    Submit your program in a file called ex1.spl. And call over your instructor to go over the AST that results when you run your program through the existing spl program from the starter code. Note: you should enclose your program in a single block with curly braces so that it prints as a single AST.

5 The SPL interpreter

As you may have noticed, the starter code for today's lab is fairly extensive. This is intimidating, because you're going to be modifying a piece of software that you didn't create and that you might not fully understand yet. But the good news is that you don't have to write everything from scratch! I suggest you dive in with the exercises below and don't worry about parts of the code until you need to. As always, ask your instructor as you encounter difficulties!

Here's an overview of how the interpreter works. This is a process you should be pretty familiar with by now, although you haven't seen it all in working C++ code before:

  1. The scanner (spl.lpp) reads in the SPL program and breaks it up into tokens.
  2. The parser (spl.ypp) organizes these tokens according to the grammar rules. As a side-effect to parsing, an Abstract Syntax Tree (AST) is created.
  3. Every node in the AST is a pointer to an object of a subclass of AST. The two main subclasses are Stmt for statements and Exp for expressions. Every Stmt subclass has an exec() method to do whatever it is that statement does, and each subclass of Exp has an eval() method that returns the value that that expression computes to.
  4. The top node in the AST (which must be a Stmt) is executed, which recursively causes other statements to be executed and various expressions to be evaluated. All these methods are declared in the ast.hpp file.

Your task in this lab is to write all those exec and eval methods, except for the ones having to do with lambdas and function calls. You can see all the types of statements and expressions by looking at the top of the ast.hpp file (around line 45). Each one of those classes needs a shiny new exec() or eval() method, except for the following which I've already written for you: NullStmt, Write, Num, and ArithOp.

The exercises below give a nice ordering and hints on completing the implementation. Good luck!

6 Operations

Since Num, ArithOp, and Write have already been written for you, simple SPL programs that do some light number crunching work already with the starter code:

spl> write 10;
10
spl> write 5 + 3;
8
spl> write 18 - 15 / 3;
13

But alas, none of the following examples work:

spl> write -(5 + 5);
-10
spl> write 4 > 3;
true
spl> write 11 = 12;
false
spl> write true;
true
spl> write true and false;
false
spl> write false or true;
true
spl> write not true;
false
spl> write 6 + 3 != 2 and not 6 + 3 <= 2;
true

So fix it! This requires implementing the eval methods in the classes mentioned below. For each of these, you will have to declare a new method in the class with a line like

Value eval();
You can then either define the method right there in the ast.hpp file, or externally in the ast.cpp file. Remember, external definitions require you to specify the class name, like
Value ArithOp::eval() { /* code here */ }

Notice that the eval method in ArithOp has a switch statement depending on which operator is being used. Look in the class declaration for how the operator is stored, and see the top of ast.hpp for the enum type Oper that you will need to use.

Remember that an operation like + must evaluate its arguments first before using them! This is accomplished by calling the eval method recursively. For instance, the first argument to an ArithOp is evaluated by calling left->eval().

Look at the definition of the Value class in value.hpp. This is the type returned by the recursive calls to eval(). You will be interested in using the num() method, which returns the integer value that is stored in the object. (The tf() method returns the bool value instead.) Also note that we have constructors from the basic types, so for example to return a numerical value of 8, you would write something like return Value(8);. A line like this constructs and then returns a Value object holding the integer (or boolean) value that it's given.

Tip: once you get your interpreter going, it will start to get really annoying having the AST pop up on the screen every time. To get rid of it, just change the line

bool showAST = true;
in the main method of spl.ypp to false. This will ensure that the AST still gets generated and saved in spl.pdf, but you don't have to look at it unless you need to.

Exercises

  1. Write the eval() method in NegOp, so that unary negation with the - sign works.
  2. Write the eval() method in CompOp, so that integer comparison operations like 5 != 12 work.
  3. Write the eval() method in BoolExp, so that boolean constants like false work.
  4. Write the eval() methods in BoolOp and NotOp, so that the and, or, and not operators work.

7 Variables

Look at the SymbolTable class in the starter code file st.hpp. There is a global SymbolTable object ST that will be used to implement a single, global scope. (Don't worry! We'll implement proper scoping in the coming labs.) Read and understand what the three public methods of the class do.

Exercises

  1. Get variable declaration and assignment working using new and :=. This means implementing the exec() methods in classes NewStmt and Asn.
    You should print helpful error messages and set the error flag to true if a variable is declared twice (two new's) or if it's assigned before it's declared. For example:
    spl> new x := 5;  # This works fine
    spl> x := 20;     # This is fine too
    spl> y := 20;
    ERROR: Can't rebind y; not yet bound!
    spl> new x := 30;
    ERROR: Variable x already bound!
    (You will have to use the methods of the SymbolTable class in st.hpp. You might have to change some methods in that class to get all the error handling working correctly.)
  2. Get variable reference working in the spl interpreter program. A program like
    new x := 5;
    x := x + 10;
    write x;
    should work and produce 15.
    This means implementing eval() in the Id class.

8 Control Structures

It looks like there are three control structures in SPL: if, ifelse, and while. But actually these are only two, because an if and an ifelse statement in the code both become a single IfStmt object in the AST. The only difference is that the else block might be NullStmt if the first syntax was used.

To get these structures working, you first need to implement the exec() method in the Block class. You may have noticed that blocks with curly braces produce valid ASTs, but still make the interpreter give up. Well, a bunch of statements surrounded by curly braces is a Block, and that's where you will start.

Exercises

  1. Get blocks of statements in curly braces working. This means writing the exec() method in the Block class, so that a program like
    { new var := 10;
      var := var * var;
      write var;
    }
    works correctly and prints out 100. HINT: This should be a two-liner!
  2. Get ifs and ifelses working. This means writing the exec() method in the IfStmt class, so that a program like
    ifelse 40 < 11
      { write true; }
      { write false;
        if 2 != 2 { write 2; }
      }
    works correctly and prints out false.
  3. Get while statements working. This means writing the exec() method in the WhileStmt class, so that a program like
    new i := 3;
    while i > 0 {
      write i;
      i := i - 1;
    }
    works correctly and prints out
    3
    2
    1

9 Last one: read

Okay, now just about everything is working except for functions (which we'll deal with next class) and read. In SPL, read is an expression, which means that it returns a value (namely, whatever the user types in).

Reading input can create some awkward problems because your actual code (the SPL program that is being interpreted) is coming from stdin, but so is the input to that program when you have read expressions. To help make this a little less confusing, print out a read> prompt when the read statement executes. Then a run of your interpreter might look like:

spl> new secret := 42;
spl> new numtries := 0;
spl> while read != secret { numtries := numtries + 1; }
read> 15
read> -3
read> 21
read> 42
spl> write numtries;
3

Exercises

  1. Get the read command working. Typing
    new x := 4; x := read;
    into your interpreter should cause it to wait until an integer is typed in. Then writing
    write x;
    should echo that typed-in integer back to the screen. This means writing the eval() method in the Read class.
  2. Now your interpreter should have everything you need to run your program from Exercise #1! If you give a filename as a command-line argument to the spl interpreter, it reads and executes SPL code from that file instead of reading in the code from the terminal. So you should be able to do:
    roche@ubuntu$ ./spl ex1.spl
    read> 5
    read> 6
    read> 12
    read> 0
    23
    There is nothing to hand in or to test for this part but rest assured that I will test it when you submit your code! (Plus it should feel damn good to get this working.)

10 Two OPTIONAL extensions

If you've gotten this far, great job! Let's add two more nice features that will make programming more pleasant: short-circuit evaluation, and debugging statements.

Short-circuit evaluation means cutting off a boolean expression with ands and ors as soon as we know the answer. For instance, since false and XXXX must be false no matter what XXXX is, we can just return false from this without actually evaluating the second argument.

Debugging statements are little snippets of text that print to the screen when a certain part of your code is reached. Since SPL doesn't have any support for strings, we have to add a language feature to support this. Our approach will be that anything enclosed in double-quotes is a debugging statement. This will require a new AST Stmt node class called Debug, which you will have to create.

Some specifics on the debug statements

  • The debug statements can go in between other statements, but not in the middle. So write 5; "hello!" write 6; would be OK, but not write "the interruptor!" 5;.
  • Debug statements do not need to be followed by a semicolon.
  • Because debug statements are going into your AST, they should be executed every time that part of the program is executed. So for example, running
    if false { "toodaloo" }
    should not result in any output.

Exercises

  1. (OPTIONAL) Get short-circuit evaluation working correctly for boolean operators. Expressions like the following should then work:
    write false and 1/0;
    (would have given division-by-zero error before), and
    write true or what;
    (would have given undefined variable error before).
  2. (OPTIONAL) Add in debugging statements as described above. Besides the new Debug class in ast.hpp, you will also need a new token type, a new scanner rule, and a new parser rule. But it'll be worth it, I swear!