SI 413 Fall 2023 / Labs


This is the archived website of SI 413 from the Fall 2023 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

Lab 7: Abstract Syntax Trees

This lab is due at 2359 on Wednesday, 18 October. It should be submitted to the course SI413 as Lab07, with the files named properly according to the lab instructions. See the submit page for details. You are highly encouraged to submit whatever you have done by the end of this lab time. Remember that you must thoroughly test your code; the tests that are visible before the deadline are only there to make sure you are using the right names and input/output formats.

1 Starter Code

This lab starts a new group of labs. This means two things. First, you need to choose a new lab partner and make sure to notate that on the shared Google sheet.

Second, you need to create a new repo for your shared work running the following command. (One lab partner should run this command first to create the shared repo, and then the other lab partner should run it next to clone the repo.)

roche@ubuntu$ 413repo spl

If that command doesn't work, check the initial setup instructions.

You should now have a folder like si413/spl/lab07 with the starter code for this lab, the lab07.md file you will need to fill out, and a handy script to submit your code.

You can also browse the starter code from the Gitlab page (USNA intranet only).

2 Introduction

My big hint for the lab:

For this lab, the secret is to work carefully and test constantly after each small step.

There are a lot of methods you need to write, but nothing individually is very complicated at all. What makes it challenging is the scale and the way all the parts interact.

By working for simplicity and constantly testing your own code, you will have the best chance of success and hopefully even a little fun. Don't be intimidated - you are ready for this. Good luck!

From today's lab all the way until the end of the semester, we will be working on interpreting and compiling the SPL language that we introduced last week. Here's how it will happen:

  • Lab 7 (today!): Interpreting an Abstract Syntax Tree.
    We combine what we learned in Lab 5 (introduction to ANTLR and Maven) with what we did last week in Lab 6 (intro to SPL) to build the beginnings of a real interpreter of SPL. Specifically, we will be able to interpret all language features except functions, using a single global scope.
  • Lab 8: Functions and Lexical scoping with frames.
    We will implement the one feature missing from today's lab - functions! This requires us to implement lexical scoping using activation frames. That allows our language to support advanced functional programming features using closures, like we saw in Scheme.
  • Lab 9: Type checking and Builtins
    We will add both dynamic and static type-checking features to our interpreter to support richer feedback and reliability for the SPL programmer. We will also introduce the idea of builtin functions, which are special functions that "look like" any other SPL function, but in fact operate quite differently under the hood.
  • Labs 10 and 11: Compiler.
    This exciting two-lab sequence will be our bief foray into actual compilation. We will produce LLVM intermediate expression code, the same thing that is used by the popular clang/clang++ compilers to compile C and C++ programs. As a result, you will have a complete working compiler that turns a .spl file in our made-up language into an actual executable.
  • Lab 12: Garbage collection.
    To close off the semester on a "lighter" note, automatic garbage collection will be added to our interpreter using the mark-and-sweep paradigm, to prevent memory leaks and make our interpreter more robust.

3 The SPL interpreter

As you may have noticed, the starter code for today's lab is fairly extensive. This is intimidating, because you're going to be modifying a piece of software that you didn't create and that you might not fully understand yet. But the good news is that you don't have to write everything from scratch! I suggest you dive in with the exercises below and don't worry about parts of the code until you need to. As always, ask your instructor as you encounter difficulties!

3.1 Interpreter overview

Here's an overview of how the interpreter works. This is a process you should be pretty familiar with by now, although you haven't seen it all in working Java code before:

  1. Scanning
  2. Based on the regex specifications in src/main/antlr4/si413/spl/SPLLexer.g4 ANTLR generates a SPLLexer class that can turn any file (or character stream) into a stream of SPL tokens such as ID, NUM, OPA, etc.

  3. Parsing
  4. Similarly to scanning, ANTLR reads the grammar specified in src/main/antlr4/si413/spl/SPLParser.g4 and generates a SPLParser class that can turn a token stream into a parse tree. The nodes of the parse tree correspond to the labels in the .g4 file like NewVar, IfStmt, Num, Read, etc.

  5. AST creation
  6. This part is new, and it's what you will complete for the first part of today's lab.

    The functions in the three "builder" classes ExpressionBuilder, StatementBuilder, and StlistBuilder (all found in src/main/java/si413/spl) turn parse tree nodes into AST nodes.

    As we learned in class the Abstract Syntax Tree (AST) is not exactly the same as the parse tree, but is a more general abstraction that also strips away purely syntactic stuff like semicolons and parentheses.

    The AST classes are in a subpackage si413.spl.ast, and they have names like Block, Write, BoolOp, etc.

  7. AST execution
  8. An SPL program consists of a series of statements, which in terms of the AST are all classes that are a subclass of si413.spl.ast.Statement. Each Statement class has a very important method execute() which makes that statement "run" or do whatever it is supposed to do. To execute an AST, we just run the execute() method for the statement at the top of the AST.

    Within statements, in SPL (or any other programming langauge) we also have expressions, which are program fragments that can be evaluated to produce a result. In SPL, these are all subclasses of si413.spl.ast.Expression and instead of an execute method, they all have a method evaluate() that returns an integer.

3.2 What works already - and how

Let's see what already works in the SPL interpreter and walk through how the interpretation happens specifically in code.

You can fire up the interpreter by first compiling via mvn compile and then running ./spli.sh. Right now boolean constants (true and false), boolean operations (and, or, not), and write statements are the only things that actually work.

For example, try this:

roche@ubuntu$ ./spli.sh
spl> write true and not false;
1

Notice that everything (for now) is an integer, so true and false are represented by 1 and 0 respectively. (This will change in a few weeks!)

Now let's think about exactly what happens for SPL to evaluate the program write true and not false;:

  1. Scanning:
    According to the SPLLexer.g4 specification, this translates to the token stream
    WRITE BOOL BOP NOT BOOL STOP
    This part is already completely written for you and doesn't need to be changed.
  2. Parsing:
    According to the SPLParser.g4 specification, the token stream is interpreted as the following parse tree.

    This part is already completely written for you and doesn't need to be changed, although you will need to understand it for the next part.
  3. AST Creation
    From that parse tree, the methods in the three Builder classes create AST node objects to form the Abstract Syntax Tree.
    The building works recursively from the bottom-up. In the above example, the first AST nodes created will be from the leaves, from the following method in the ExpressionBuilder class:
    @Override
    public Bool visitBool(SPLParser.BoolContext ctx) {
        String literal = ctx.BOOL().getText();
        if (literal.equals("true")) return new Bool(true);
        else return new Bool(false);
    }
    Make sure you undertand what is happening. This is taking a parse tree node (that is tagged in the grammar with #Bool), and returning an AST node with type si413.spl.ast.Bool (you can find it in src/main/java/si413/spl/ast/Bool.java).
    It's really not doing very much, and that's okay! Many of these methods will be very straightforward, just making a few recursive calls to visit() and then calling a constructor to return the new AST node.
    The next level up in the AST is built by this method, also in ExpressionBuilder:
    @Override
    public BoolOp visitBoolOp(SPLParser.BoolOpContext ctx) {
        Expression lhs = visit(ctx.exp(0));
        Expression rhs = visit(ctx.exp(1));
        String op = ctx.BOP().getText();
        return new BoolOp(lhs, op, rhs);
    }
    Notice that it is making two recursive calls to visit() to get the Expression nodes for both sides of the "and" or "or" operation.
    We can see the entire AST for this statement by running spli.sh with the -t tag, like this:
    roche@ubuntu$ ./spli.sh -t
    Fancy SPL Interpreter
    Enter SPL statements. Ctrl-D to exit gracefully.
    spl> write true and not false;
    Write
    └──BoolOp:and
       ├──Bool:true
       └──NotOp
          └──Bool:false
    Each of these nodes (Write, BoolOp, Bool, NotOp) is an actual Java class inside the ast subpackage. Check them out!
  4. AST Execution:
    Finally, to actually run this small SPL program, involves calling the execute() method on the top statement. For the Write AST node, we can see this method defined in src/main/java/si413/spl/ast/Write.java:
    @Override
    public void execute() {
        int result = exp.evaluate();
        Interpreter.current().write(result);
    }
    Once again, notice that the methods are almost always going to be short and relatively simple! But there are a lot of them that you will have to write.
    In this case, to execute the Write statement, we have to recursively evaluate the expression underneath it (in this case, that's a field in the class called exp), and then print the resulting value to the screen.
    It's important to use Interpreter.current().write(...) rather than System.out so that we can auto-test your code and run it from files or strings etc. But fortunately, this is really the only method that should print anything out, and it's already written for you!
    To fully understand the above example, you should also check out the evaluate() methods in BoolOp.java, NotOp.java, and Bool.java.

3.3 Your tasks

Your task in this lab is to do two things: (1) complete the StatementBuilder.java and ExpressionBuilder.java classes to turn any SPL parse tree into a complete, valid AST; and (2) complete the execute() or evaluate() methods in every Statement or Expression AST node class (respectively) so that your interpreter actually works.

The only things you can ignore for this lab are things having to do with functions, namely lambdas (function definition) and function calls. Everything else should be working by the end of the lab. Yes, you can do it!

Note that you should only need to change those parts of the code: filling in methods in the two Builder classes, and adding evaluate/execute methods to the ast Expression and Statement subclasses. As you start to fill these in, more and more expansive SPL programs will start to work with your interpreter. It's like cooking your own birthday dinner and eating it. Yum!

As a hint, none of the coding itself is particularly hard or compilcated for this lab, but that doesn't mean it will be easy! You have to write a lot of small methods and each has a very particular task to do, which requires understanding and careful thought. So work carefully, test frequently after completing each small step, and use your lab partner to discuss and make sure you are on the right track.

The exercises below give a nice ordering and hints on completing the implementation. Good luck!

4 AST creation

You can run ./spli.sh with the -d ("don't execute") and -t ("show Tree") options just to see the Abstract Syntax Tree for each statement you enter.

Right now, write statements that use boolean constance true/false, and/or operations, and not expressions, will work; these have already been written for you in the starter code.

If you try to do anything else like the expressions below, you will get null pointer exceptions. That means that there is a method missing in your StatementBuilder or ExpressionBuilder classes to turn the parse tree node into an AST node!

For this part, complete the StatementBuilder and ExpressionBuilder classes so that you can see the ASTs for statements like these:

roche@ubuntu$ ./spli.sh -d -t
Fancy SPL Interpreter
Enter SPL statements. Ctrl-D to exit gracefully.
spl> write 10;
Write
└──Num:10
spl> new x := 13;
NewStmt:x
└──Num:13
spl> write x * 6 + 5;
Write
└──ArithOp:+
   ├──ArithOp:*
   │  ├──Id:x
   │  └──Num:6
   └──Num:5
spl> write 1 < 2;
Write
└──CompOp:<
   ├──Num:1
   └──Num:2
spl> ifelse not true { x := 100; } { y := 200; }
IfElse
├──NotOp
│  └──Bool:true
├──Block
│  └──Asn:x
│     └──Num:100
└──Block
   └──Asn:y
      └──Num:200
spl> while var < 100 { new blah := var * 10; }
WhileStmt
├──CompOp:<
│  ├──Id:var
│  └──Num:100
└──Block
   └──NewStmt:blah
      └──ArithOp:*
         ├──Id:var
         └──Num:10
spl> if 3 = 4 { write 10; write 11; write 12; }
IfElse
├──CompOp:=
│  ├──Num:3
│  └──Num:4
├──Block
│  ├──Write
│  │  └──Num:10
│  ├──Write
│  │  └──Num:11
│  └──Write
│     └──Num:12
└──Block

Notice that there are plenty of logical errors above like undefined variables and infinite loops - that has nothing to do with just creating the AST which should still work.

Here are a few tips which may be helpful in (quickly) getting this part done:

  • You only need to be editing (adding methods to) the ExpressionBuilder and StatementBuilder classes for this part.
  • To know which methods to add, look at the parser specification grammar in src/main/antlr4/si413/spl/SPLParser.g4. Each kind of stmt in the grammer will have a method in StatementBuilder, and each kind of exp in the grammar will have a method in ExpressionBuilder.
  • There are a few methods that are already provided for you in the starter code. Make sure you understand what those are doing. The ones you write will mostly be very similar to these, and short! (If you are writing more than a few lines of code for each method, there is probably a better way!)
  • Most of the builder methods will be constructing and returning a new AST node. These classes are in the folder src/main/java/si413/spl/ast/. Look at the source code there to see what arguments are needed for each constructor.
  • A few of the methods don't actually need to create a new AST node, but can just return an AST node that was already created. For example, there is a grammar rule for parentheses (which means you have to write a method for it in the ExpressionBuilder class), but there is of course no parentheses node in the AST. In this case, you will want to just get the existing AST node for the expression inside the parentheses and return it.
  • Similarly, the NegOp rule in the grammar allows any +/- in front of another expression. So in SPL it is valid to write something like write ++--++-+-3;. When the operator is -, you need to create a new NegOp node. But when the operator is +, that doesn't actually "do" anything and doesn't need a new node in the AST; you can just return the Expression node itself (from recursively calling visit) in this case.
  • One other slightly tricky case is for numeric arithmetic: There are two separate grammar rules for multiplication and addition because they have different precedence, but both of those should just be returning a new instance of the ArithOp class. So you will hae two methods in ExpressionBuilder, but both do (almost) exactly the same thing and return the same type of new node.

Exercises

  1. Complete the ExpressionBuilder and StatementBuilder classes so that a correct AST for any SPL program (not including lambdas or function calls) is created.
    You should test your code by running ./spli.sh with the flags -d and -t.
    You can also run the tests for this part just as we will in the submit system with the command
    roche@ubuntu$ ./runtest.sh Ex1Test
    which uses the JUnit testing file you can see for yourself in src/test/java/si413/spl/Ex1Test.java

5 Operations

Now is time to start getting your interpreter to work and actually run code!

For this part, you will want to implement the evaluate() method in the Num, NegOp, ArithOp, NegOp, and CompOp classes. Note that the starter code already includes implementations for the Bool, NotOp, and BoolOp classes, so look there first and use those as a guide to get you started.

Once you have this part, all of the following should work:

roche@ubuntu$ ./spli.sh
spl> write true;
1
spl> write true and false;
0
spl> write 10;
10
spl> write 5 + 3;
8
spl> write 18 - 15 / 3;
13
spl> write -(5 + 5);
-10
spl> write 4 > 3;
1
spl> write 11 = 12;
0
spl> write 6 + 3 != 2 and not 6 + 3 <= 2;
1

Remember that an operation like + must evaluate its arguments first before using them! This is accomplished by calling the evaluate() method recursively. For instance, the first argument to a BoolOp is evaluated by calling lhsExp.evaluate().

One last thing: error handling. There is one specific kind of run-time error that can occur in an SPL program at this point. When that happens, rather than crashing unceremoniously with a random Java exception, your interpreter should call Interpreter.current().error(...) with an appropriate error message.

In particular, we should get behaviour like this when running the "fancy" interpreter:

roche@ubuntu$ ./spli.sh
Fancy SPL Interpreter
Enter SPL statements. Ctrl-D to exit gracefully.
spl> write 3 + 2 / (5 - 5);
ERROR: Divide by zero
spl> write 100;
100

Exercises

  1. Write the evaluate() method in the Num class.
    At this point, a simple SPL statement like
    write 123;
    should work.
  2. Write the evaluate() method in the NegOp class, so that something like
    write -(17);
    works.
    (Tip: if you get NullPointerException here, it might mean you didn't correctly handle parentheses in your ExpressionBuilder class for the first part of the lab.)
  3. Write the evaluate() method in the ArithOp class.
    Now addition, subtraction, multiplication, and division (rounding down) should work.
  4. Write the evaluate() method in the CompOp class. Take note: there are 6 comparison operators in SPL: <, <=, =, >=, >, and !=
    At this point, all of the examples above (plus any more like them) should work.
    Like before, you can run (some of) the auto-tests yourself with this command:
    roche@ubuntu$ ./runtest.sh Ex2345Test

6 Variables

Look at the Frame class in the starter code file Frame.java. There are tree crucial methods there: bind, rebind, and lookup. Make sure you undertand what they do!

For this lab, we will use a single global Frame, which is accessible from the current interpreter by calling Interpreter.current().getGlobal().

Once you have all of this, these kinds of statements should work:

roche@ubuntu$ ./spli.sh
spl> new x := 5;
spl> write x;
5
spl> new other := x * 3;
spl> write other;
15
spl> x := -10;
spl> write x + other;
5

But there are also some things which should not work. Namely, when we try to make a new variable that already exists, reassign a variable that doesn't exist, or reference a variable that doesn't exist:

roche@ubuntu$ ./spli.sh
spl> write what * 3;
ERROR: No binding for variable 'what'
spl> new what := 20;
spl> new what := 100;
ERROR: Cannot bind 'what', already set to '20'
spl> ok := 17;
ERROR: Variable 'ok' not yet bound; cannot rebind

Exercises

  1. Get variable declaration, assignment, and reassignment working by implementing the execute() methods in the NewStmt and Asn classes.
    At this point, you should be able to assign variables, but you won't really have any way of checking the values.
    Use the global frame to store variable values, all of which (for now) will just be integers.
  2. Get variable lookup working by implementing the evaluate() method in the and Id class.
    At this point, all of the above examples should work correctly.
    You can run (some of) the auto-tests yourself with this command:
    roche@ubuntu$ ./runtest.sh Ex67Test

7 Control Structures

It looks like there are three control structures in SPL: if, ifelse, and while. But actually these are only two, because an if and an ifelse statement in the code both become a single IfStmt object in the AST. The only difference is that the else block might be NullStmt if the first syntax was used.

To get these structures working, you first need to implement the execute() method in the Block class. You may have noticed that blocks with curly braces produce valid ASTs, but still make the interpreter give up. Well, a bunch of statements surrounded by curly braces is a Block, and that's where you will start.

Exercises

  1. Get blocks of statements in curly braces working. This means writing the execute() method in the Block class, so that a program like
    { new var := 10;
      var := var * var;
      write var;
    }
    works correctly and prints out 100.
    (Hint: if you are getting NullPointerExceptions, it probably means you forgot some AST creation in the StatementBuilder class.)
  2. Get ifs and ifelses working. This means writing the execute() method in the IfStmt class, so that a program like
    ifelse 40 < 11
      { write true; }
      { write false;
        if 2 != 2 { write 2; }
      }
    works correctly and prints out 0 (for false).
  3. Get while statements working. This means writing the execute() method in the WhileStmt class, so that a program like
    new i := 3;
    while i > 0 {
      write i;
      i := i - 1;
    }
    works correctly and prints out
    3
    2
    1

    You can run (some of) the auto-tests yourself with this command:
    roche@ubuntu$ ./runtest.sh Ex8910Test

8 Last one: read

Okay, now just about everything is working except for functions (which we'll deal with next lab) and read. In SPL, read is an expression, which means that it returns a value (namely, whatever the user types in).

Reading input can create some awkward problems because your actual code (the SPL program that is being interpreted) is coming from stdin, but so is the input to that program when you have read expressions.

So, similarly to printing, you shouldn't read directly using System.in but rather using Interpreter.current().read(...). Take a look at the Interpreter.java interface for the documentation.

Now a very cool program like this should work:

spl> new secret := 42;
spl> new numtries := 0;
spl> while read != secret { numtries := numtries + 1; }
read> 15
read> -3
read> 21
read> 42
spl> write numtries;
3

Exercises

  1. Get the read command working. Typing
    new x := 4; x := read;
    into your interpreter should cause it to wait until an integer is typed in. Then writing
    write x;
    should echo that typed-in integer back to the screen. This means writing the eval() method in the Read class.
    You can run (some of) the auto-tests yourself with this command:
    roche@ubuntu$ ./runtest.sh Ex11Test

9 Two OPTIONAL extensions

If you've gotten this far, great job! Let's add two more nice features that will make programming more pleasant: short-circuit evaluation, and debugging statements.

Short-circuit evaluation means cutting off a boolean expression with ands and ors as soon as we know the answer. For instance, since false and XXXX must be false no matter what XXXX is, we can just return false from this without actually evaluating the second argument.

Debugging statements are little snippets of text that print to the screen when a certain part of your code is reached. Since SPL doesn't have any support for strings, we have to add a language feature to support this. Our approach will be that anything enclosed in double-quotes is a debugging statement. This will require a new AST Stmt node class called Debug, which you will have to create.

Some specifics on the debug statements

  • The debug statements can go in between other statements, but not in the middle. So write 5; "hello!" write 6; would be OK, but not write "the interruptor!" 5;.
  • Debug statements do not need to be followed by a semicolon.
  • Because debug statements are going into your AST, they should be executed every time that part of the program is executed. So for example, running
    if false { "toodaloo" }
    should not result in any output.

Exercises

  1. (OPTIONAL) Get short-circuit evaluation working correctly for boolean operators. Expressions like the following should then work:
    write false and 1/0;
    (would have given division-by-zero error before), and
    write true or what;
    (would have given undefined variable error before).
    You can run (some of) the auto-tests yourself with this command:
    roche@ubuntu$ ./runtest.sh Ex12Test
  2. (OPTIONAL) Add in debugging statements as described above. Because the ANTLR regex syntax is a little strange, we already have a token for DEBUG statements in the SPLLexer.g4 file. But you will still need add a new parser rule, a new StatementBuilder method, and a new AST class, to get this working.
    Each of these things should be short, but you really have to understand the whole lab to get it done. It's well worth your time, I swear!
    You can run (some of) the auto-tests yourself with this command:
    roche@ubuntu$ ./runtest.sh Ex13Test