Lab 7: Abstract Syntax Trees
This lab is due at 2359 on Wednesday, 18 October. It should be submitted to the course SI413 as Lab07, with the files named properly according to the lab instructions. See the submit page for details. You are highly encouraged to submit whatever you have done by the end of this lab time. Remember that you must thoroughly test your code; the tests that are visible before the deadline are only there to make sure you are using the right names and input/output formats.
1 Starter Code
This lab starts a new group of labs. This means two things. First, you need to choose a new lab partner and make sure to notate that on the shared Google sheet.
Second, you need to create a new repo for your shared work running the following command. (One lab partner should run this command first to create the shared repo, and then the other lab partner should run it next to clone the repo.)
roche@ubuntu$
413repo spl
If that command doesn't work, check the initial setup instructions.
You should now have a folder like
si413/spl/lab07
with the starter code for this lab, the
lab07.md
file you will need to fill out,
and a handy script to submit your code.
You can also browse the starter code from the Gitlab page (USNA intranet only).
2 Introduction
My big hint for the lab:
For this lab, the secret is to work carefully and test constantly after each small step.
There are a lot of methods you need to write, but nothing individually is very complicated at all. What makes it challenging is the scale and the way all the parts interact.
By working for simplicity and constantly testing your own code, you will have the best chance of success and hopefully even a little fun. Don't be intimidated - you are ready for this. Good luck!
From today's lab all the way until the end of the semester, we will be working on interpreting and compiling the SPL language that we introduced last week. Here's how it will happen:
- Lab 7 (today!): Interpreting an Abstract Syntax Tree.
We combine what we learned in Lab 5 (introduction to ANTLR and Maven) with what we did last week in Lab 6 (intro to SPL) to build the beginnings of a real interpreter of SPL. Specifically, we will be able to interpret all language features except functions, using a single global scope. - Lab 8: Functions and Lexical scoping with frames.
We will implement the one feature missing from today's lab - functions! This requires us to implement lexical scoping using activation frames. That allows our language to support advanced functional programming features using closures, like we saw in Scheme. - Lab 9: Type checking and Builtins
We will add both dynamic and static type-checking features to our interpreter to support richer feedback and reliability for the SPL programmer. We will also introduce the idea of builtin functions, which are special functions that "look like" any other SPL function, but in fact operate quite differently under the hood. - Labs 10 and 11: Compiler.
This exciting two-lab sequence will be our bief foray into actual compilation. We will produce LLVM intermediate expression code, the same thing that is used by the popular clang/clang++ compilers to compile C and C++ programs. As a result, you will have a complete working compiler that turns a .spl file in our made-up language into an actual executable. - Lab 12: Garbage collection.
To close off the semester on a "lighter" note, automatic garbage collection will be added to our interpreter using the mark-and-sweep paradigm, to prevent memory leaks and make our interpreter more robust.
3 The SPL interpreter
As you may have noticed, the starter code for today's lab is fairly extensive. This is intimidating, because you're going to be modifying a piece of software that you didn't create and that you might not fully understand yet. But the good news is that you don't have to write everything from scratch! I suggest you dive in with the exercises below and don't worry about parts of the code until you need to. As always, ask your instructor as you encounter difficulties!
3.1 Interpreter overview
Here's an overview of how the interpreter works. This is a process you should be pretty familiar with by now, although you haven't seen it all in working Java code before:
- Scanning
- Parsing
- AST creation
- AST execution
Based on the regex specifications in src/main/antlr4/si413/spl/SPLLexer.g4
ANTLR generates a SPLLexer
class that can turn any file (or character stream)
into a stream of SPL tokens such as ID
, NUM
, OPA
, etc.
Similarly to scanning, ANTLR reads the grammar specified in
src/main/antlr4/si413/spl/SPLParser.g4
and generates a SPLParser
class that can turn a token stream
into a parse tree. The nodes of the parse tree correspond to the labels
in the .g4
file like NewVar
, IfStmt
,
Num
, Read
, etc.
This part is new, and it's what you will complete for the first part of today's lab.
The functions in the three "builder" classes
ExpressionBuilder
, StatementBuilder
, and StlistBuilder
(all found in src/main/java/si413/spl
)
turn parse tree nodes into AST nodes.
As we learned in class the Abstract Syntax Tree (AST) is not exactly the same as the parse tree, but is a more general abstraction that also strips away purely syntactic stuff like semicolons and parentheses.
The AST classes are in a subpackage si413.spl.ast
, and they have names like
Block
, Write
, BoolOp
, etc.
An SPL program consists of a series of statements, which in terms of the AST
are all classes that are a subclass of si413.spl.ast.Statement
.
Each Statement class has a very important method execute()
which makes that
statement "run" or do whatever it is supposed to do. To execute an AST, we just run the
execute()
method for the statement at the top of the AST.
Within statements, in SPL (or any other programming langauge) we also have expressions,
which are program fragments that can be evaluated to produce a result.
In SPL, these are all subclasses of si413.spl.ast.Expression
and instead of an
execute method, they all have a method evaluate()
that returns an integer.
3.2 What works already - and how
Let's see what already works in the SPL interpreter and walk through how the interpretation happens specifically in code.
You can fire up the interpreter by first compiling via mvn compile
and then running ./spli.sh
. Right now boolean constants (true
and
false
), boolean operations (and
, or
, not
),
and write
statements are the only things that actually work.
For example, try this:
roche@ubuntu$
./spli.sh
spl>
write true and not false;
1
Notice that everything (for now) is an integer, so true and false are represented by 1 and 0 respectively. (This will change in a few weeks!)
Now let's think about exactly what happens for SPL to evaluate the program
write true and not false;
:
- Scanning:
According to theSPLLexer.g4
specification, this translates to the token streamWRITE BOOL BOP NOT BOOL STOP
This part is already completely written for you and doesn't need to be changed. - Parsing:
According to theSPLParser.g4
specification, the token stream is interpreted as the following parse tree.
This part is already completely written for you and doesn't need to be changed, although you will need to understand it for the next part. - AST Creation
From that parse tree, the methods in the three Builder classes create AST node objects to form the Abstract Syntax Tree.
The building works recursively from the bottom-up. In the above example, the first AST nodes created will be from the leaves, from the following method in theExpressionBuilder
class:
Make sure you undertand what is happening. This is taking a parse tree node (that is tagged in the grammar with@Override public Bool visitBool(SPLParser.BoolContext ctx) { String literal = ctx.BOOL().getText(); if (literal.equals("true")) return new Bool(true); else return new Bool(false); }
#Bool
), and returning an AST node with typesi413.spl.ast.Bool
(you can find it insrc/main/java/si413/spl/ast/Bool.java
).
It's really not doing very much, and that's okay! Many of these methods will be very straightforward, just making a few recursive calls tovisit()
and then calling a constructor to return the new AST node.
The next level up in the AST is built by this method, also inExpressionBuilder
:
Notice that it is making two recursive calls to@Override public BoolOp visitBoolOp(SPLParser.BoolOpContext ctx) { Expression lhs = visit(ctx.exp(0)); Expression rhs = visit(ctx.exp(1)); String op = ctx.BOP().getText(); return new BoolOp(lhs, op, rhs); }
visit()
to get the Expression nodes for both sides of the "and" or "or" operation.
We can see the entire AST for this statement by runningspli.sh
with the-t
tag, like this:
Each of these nodes (Write, BoolOp, Bool, NotOp) is an actual Java class inside theroche@ubuntu$
./spli.sh -t
Fancy SPL Interpreter
Enter SPL statements. Ctrl-D to exit gracefully.
spl>
write true and not false;
Write
└──BoolOp:and
├──Bool:true
└──NotOp
└──Bool:false
ast
subpackage. Check them out! - AST Execution:
Finally, to actually run this small SPL program, involves calling theexecute()
method on the top statement. For the Write AST node, we can see this method defined insrc/main/java/si413/spl/ast/Write.java
:
Once again, notice that the methods are almost always going to be short and relatively simple! But there are a lot of them that you will have to write.@Override public void execute() { int result = exp.evaluate(); Interpreter.current().write(result); }
In this case, to execute the Write statement, we have to recursively evaluate the expression underneath it (in this case, that's a field in the class calledexp
), and then print the resulting value to the screen.
It's important to useInterpreter.current().write(...)
rather than System.out so that we can auto-test your code and run it from files or strings etc. But fortunately, this is really the only method that should print anything out, and it's already written for you!
To fully understand the above example, you should also check out theevaluate()
methods inBoolOp.java
,NotOp.java
, andBool.java
.
3.3 Your tasks
Your task in this lab is to do two things: (1)
complete the StatementBuilder.java
and
ExpressionBuilder.java
classes to turn any SPL parse tree
into a complete, valid AST; and (2) complete the execute()
or evaluate()
methods in every Statement or Expression
AST node class (respectively) so that your interpreter actually works.
The only things you can ignore for this lab are things having to do with functions, namely lambdas (function definition) and function calls. Everything else should be working by the end of the lab. Yes, you can do it!
Note that you should only need to change those parts of the code: filling in methods in the two Builder classes, and adding evaluate/execute methods to the ast Expression and Statement subclasses. As you start to fill these in, more and more expansive SPL programs will start to work with your interpreter. It's like cooking your own birthday dinner and eating it. Yum!
As a hint, none of the coding itself is particularly hard or compilcated for this lab, but that doesn't mean it will be easy! You have to write a lot of small methods and each has a very particular task to do, which requires understanding and careful thought. So work carefully, test frequently after completing each small step, and use your lab partner to discuss and make sure you are on the right track.
The exercises below give a nice ordering and hints on completing the implementation. Good luck!
4 AST creation
You can run ./spli.sh
with the -d
("don't execute")
and -t
("show Tree")
options just to see the Abstract Syntax Tree for each statement you enter.
Right now, write statements that use boolean constance true/false, and/or operations, and not expressions, will work; these have already been written for you in the starter code.
If you try to do anything else like the expressions below, you will get null
pointer exceptions. That means that there is a method missing in your
StatementBuilder
or ExpressionBuilder
classes to turn
the parse tree node into an AST node!
For this part, complete the StatementBuilder and ExpressionBuilder classes so that you can see the ASTs for statements like these:
roche@ubuntu$
./spli.sh -d -t
Fancy SPL Interpreter
Enter SPL statements. Ctrl-D to exit gracefully.
spl>
write 10;
Write
└──Num:10
spl>
new x := 13;
NewStmt:x
└──Num:13
spl>
write x * 6 + 5;
Write
└──ArithOp:+
├──ArithOp:*
│ ├──Id:x
│ └──Num:6
└──Num:5
spl>
write 1 < 2;
Write
└──CompOp:<
├──Num:1
└──Num:2
spl>
ifelse not true { x := 100; } { y := 200; }
IfElse
├──NotOp
│ └──Bool:true
├──Block
│ └──Asn:x
│ └──Num:100
└──Block
└──Asn:y
└──Num:200
spl>
while var < 100 { new blah := var * 10; }
WhileStmt
├──CompOp:<
│ ├──Id:var
│ └──Num:100
└──Block
└──NewStmt:blah
└──ArithOp:*
├──Id:var
└──Num:10
spl>
if 3 = 4 { write 10; write 11; write 12; }
IfElse
├──CompOp:=
│ ├──Num:3
│ └──Num:4
├──Block
│ ├──Write
│ │ └──Num:10
│ ├──Write
│ │ └──Num:11
│ └──Write
│ └──Num:12
└──Block
Notice that there are plenty of logical errors above like undefined variables and infinite loops - that has nothing to do with just creating the AST which should still work.
Here are a few tips which may be helpful in (quickly) getting this part done:
- You only need to be editing (adding methods to) the ExpressionBuilder and StatementBuilder classes for this part.
-
To know which methods to add, look at the parser specification grammar in
src/main/antlr4/si413/spl/SPLParser.g4
. Each kind ofstmt
in the grammer will have a method in StatementBuilder, and each kind ofexp
in the grammar will have a method in ExpressionBuilder. - There are a few methods that are already provided for you in the starter code. Make sure you understand what those are doing. The ones you write will mostly be very similar to these, and short! (If you are writing more than a few lines of code for each method, there is probably a better way!)
-
Most of the builder methods will be constructing and
returning a new AST node. These classes are in the folder
src/main/java/si413/spl/ast/
. Look at the source code there to see what arguments are needed for each constructor. - A few of the methods don't actually need to create a new AST node, but can just return an AST node that was already created. For example, there is a grammar rule for parentheses (which means you have to write a method for it in the ExpressionBuilder class), but there is of course no parentheses node in the AST. In this case, you will want to just get the existing AST node for the expression inside the parentheses and return it.
-
Similarly, the NegOp rule in the grammar allows any +/- in front of another expression.
So in SPL it is valid to write something like
write ++--++-+-3;
. When the operator is -, you need to create a new NegOp node. But when the operator is +, that doesn't actually "do" anything and doesn't need a new node in the AST; you can just return the Expression node itself (from recursively callingvisit
) in this case. - One other slightly tricky case is for numeric arithmetic: There are two separate grammar rules for multiplication and addition because they have different precedence, but both of those should just be returning a new instance of the ArithOp class. So you will hae two methods in ExpressionBuilder, but both do (almost) exactly the same thing and return the same type of new node.
Exercises
-
Complete the ExpressionBuilder and StatementBuilder classes so that a correct AST for
any SPL program (not including lambdas or function calls)
is created.
You should test your code by running ./spli.sh
with the flags
-d
and -t
.
You can also run the tests for this part just as we will in the submit system
with the command
roche@ubuntu$
./runtest.sh Ex1Test
which uses the JUnit testing file you can see for yourself in
src/test/java/si413/spl/Ex1Test.java
5 Operations
You should test your code by running
./spli.sh
with the flags
-d
and -t
.
You can also run the tests for this part just as we will in the submit system with the command
which uses the JUnit testing file you can see for yourself inroche@ubuntu$
./runtest.sh Ex1Test
src/test/java/si413/spl/Ex1Test.java
Now is time to start getting your interpreter to work and actually run code!
For this part, you will want to implement the evaluate()
method in
the Num, NegOp, ArithOp, NegOp, and CompOp classes. Note that the starter code
already includes implementations for the Bool, NotOp, and BoolOp classes, so look there
first and use those as a guide to get you started.
Once you have this part, all of the following should work:
roche@ubuntu$
./spli.sh
spl>
write true;
1
spl>
write true and false;
0
spl>
write 10;
10
spl>
write 5 + 3;
8
spl>
write 18 - 15 / 3;
13
spl>
write -(5 + 5);
-10
spl>
write 4 > 3;
1
spl>
write 11 = 12;
0
spl>
write 6 + 3 != 2 and not 6 + 3 <= 2;
1
Remember that an operation like +
must evaluate its arguments
first before using them!
This is accomplished by calling the evaluate()
method recursively.
For instance, the first argument to a BoolOp
is evaluated
by calling lhsExp.evaluate()
.
One last thing: error handling. There is one specific kind of run-time
error that can occur in an SPL program at this point. When that happens, rather than
crashing unceremoniously with a random Java exception, your interpreter should
call Interpreter.current().error(...)
with an appropriate error message.
In particular, we should get behaviour like this when running the "fancy" interpreter:
roche@ubuntu$
./spli.sh
Fancy SPL Interpreter
Enter SPL statements. Ctrl-D to exit gracefully.
spl>
write 3 + 2 / (5 - 5);
ERROR: Divide by zero
spl>
write 100;
100
Exercises
-
Write the
evaluate()
method in the
Num
class.
At this point, a simple SPL statement like
write 123;
should work.
-
Write the
evaluate()
method in the
NegOp
class, so that something like
write -(17);
works.
(Tip: if you get NullPointerException here,
it might mean you didn't correctly handle parentheses in your ExpressionBuilder
class for the first part of the lab.)
-
Write the
evaluate()
method in the
ArithOp
class.
Now addition, subtraction, multiplication, and division (rounding down)
should work.
-
Write the
evaluate()
method in the
CompOp
class.
Take note: there are 6 comparison operators in SPL:
<
,
<=
,
=
,
>=
,
>
, and
!=
At this point, all of the examples above (plus any more like them) should work.
Like before, you can run (some of) the auto-tests yourself with this command:
roche@ubuntu$
./runtest.sh Ex2345Test
6 Variables
evaluate()
method in the
Num
class.At this point, a simple SPL statement like
write 123;
should work.
evaluate()
method in the
NegOp
class, so that something like
write -(17);
works.(Tip: if you get NullPointerException here, it might mean you didn't correctly handle parentheses in your ExpressionBuilder class for the first part of the lab.)
evaluate()
method in the
ArithOp
class.
Now addition, subtraction, multiplication, and division (rounding down) should work.
evaluate()
method in the
CompOp
class.
Take note: there are 6 comparison operators in SPL:
<
,
<=
,
=
,
>=
,
>
, and
!=
At this point, all of the examples above (plus any more like them) should work.
Like before, you can run (some of) the auto-tests yourself with this command:
roche@ubuntu$
./runtest.sh Ex2345Test
Look at the Frame
class in the starter code file
Frame.java.
There are tree crucial methods there: bind, rebind, and lookup.
Make sure you undertand what they do!
For this lab, we will use a single global Frame, which
is accessible from the current interpreter by calling
Interpreter.current().getGlobal()
.
Once you have all of this, these kinds of statements should work:
roche@ubuntu$
./spli.sh
spl>
new x := 5;
spl>
write x;
5
spl>
new other := x * 3;
spl>
write other;
15
spl>
x := -10;
spl>
write x + other;
5
But there are also some things which should not work. Namely, when we try to make a new variable that already exists, reassign a variable that doesn't exist, or reference a variable that doesn't exist:
roche@ubuntu$
./spli.sh
spl>
write what * 3;
ERROR: No binding for variable 'what'
spl>
new what := 20;
spl>
new what := 100;
ERROR: Cannot bind 'what', already set to '20'
spl>
ok := 17;
ERROR: Variable 'ok' not yet bound; cannot rebind
Exercises
-
Get variable declaration, assignment, and reassignment
working by implementing the
execute()
methods
in the
NewStmt
and
Asn
classes.
At this point, you should be able to assign variables, but you won't
really have any way of checking the values.
Use the global frame to store variable values,
all of which (for now) will just be integers.
-
Get variable lookup working
by implementing the
evaluate()
method in the
and Id
class.
At this point, all of the above examples should work correctly.
You can run (some of) the auto-tests yourself with this command:
roche@ubuntu$
./runtest.sh Ex67Test
7 Control Structures
execute()
methods
in the
NewStmt
and
Asn
classes.
At this point, you should be able to assign variables, but you won't really have any way of checking the values.
Use the global frame to store variable values, all of which (for now) will just be integers.
evaluate()
method in the
and Id
class.
At this point, all of the above examples should work correctly.
You can run (some of) the auto-tests yourself with this command:
roche@ubuntu$
./runtest.sh Ex67Test
It looks like there are three control structures in SPL:
if
, ifelse
, and while
.
But actually these are only two, because an if
and an
ifelse
statement in the code both become a single
IfStmt
object in the AST. The only difference is that
the else block might be NullStmt
if the first syntax
was used.
To get these structures working, you first need to implement the
execute()
method in the Block
class.
You may have noticed that blocks with curly braces produce valid ASTs,
but still make the interpreter give up. Well, a bunch of statements surrounded
by curly braces is a Block
, and that's where you will start.
Exercises
-
Get blocks of statements in curly braces working.
This means writing the
execute()
method in the
Block
class, so that a program like
{ new var := 10;
var := var * var;
write var;
}
works correctly and prints out 100.
(Hint: if you are getting NullPointerExceptions, it probably means you forgot
some AST creation in the StatementBuilder class.)
-
Get
if
s and ifelse
s working.
This means writing the execute()
method in the
IfStmt
class, so that a program like
ifelse 40 < 11
{ write true; }
{ write false;
if 2 != 2 { write 2; }
}
works correctly and prints out 0 (for false).
-
Get
while
statements working.
This means writing the execute()
method in the
WhileStmt
class, so that a program like
new i := 3;
while i > 0 {
write i;
i := i - 1;
}
works correctly and prints out
3
2
1
You can run (some of) the auto-tests yourself with this command:
roche@ubuntu$
./runtest.sh Ex8910Test
8 Last one: read
execute()
method in the
Block
class, so that a program like
{ new var := 10;
var := var * var;
write var;
}
works correctly and prints out 100.
(Hint: if you are getting NullPointerExceptions, it probably means you forgot some AST creation in the StatementBuilder class.)
if
s and ifelse
s working.
This means writing the execute()
method in the
IfStmt
class, so that a program like
ifelse 40 < 11
{ write true; }
{ write false;
if 2 != 2 { write 2; }
}
works correctly and prints out 0 (for false).
while
statements working.
This means writing the execute()
method in the
WhileStmt
class, so that a program like
new i := 3;
while i > 0 {
write i;
i := i - 1;
}
works correctly and prints out
3
2
1
You can run (some of) the auto-tests yourself with this command:
roche@ubuntu$
./runtest.sh Ex8910Test
read
Okay, now just about everything is working except for functions
(which we'll deal with next lab) and read
.
In SPL, read
is an expression, which means that it
returns a value (namely, whatever the user types in).
Reading input can create some awkward problems because your actual
code (the SPL program that is being interpreted) is coming from stdin, but
so is the input to that program when you have read
expressions.
So, similarly to printing, you shouldn't read directly using System.in but
rather using Interpreter.current().read(...)
. Take a look at the
Interpreter.java
interface for the documentation.
Now a very cool program like this should work:
spl>
new secret := 42;
spl>
new numtries := 0;
spl>
while read != secret { numtries := numtries + 1; }
read>
15
read>
-3
read>
21
read>
42
spl>
write numtries;
3
Exercises
-
Get the
read
command working. Typing
new x := 4; x := read;
into your interpreter should cause it to wait until an integer is
typed in. Then writing write x;
should echo that
typed-in integer back to the screen.
This means writing the eval()
method in the Read
class.
You can run (some of) the auto-tests yourself with this command:
roche@ubuntu$
./runtest.sh Ex11Test
9 Two OPTIONAL extensions
read
command working. Typing
new x := 4; x := read;into your interpreter should cause it to wait until an integer is typed in. Then writing
write x;should echo that typed-in integer back to the screen. This means writing the
eval()
method in the Read
class.
You can run (some of) the auto-tests yourself with this command:
roche@ubuntu$
./runtest.sh Ex11Test
If you've gotten this far, great job! Let's add two more nice features that will make programming more pleasant: short-circuit evaluation, and debugging statements.
Short-circuit evaluation means cutting off a boolean expression
with and
s and or
s as soon as we know the answer.
For instance, since
false and XXXX
must be false no matter what XXXX
is,
we can just return false
from this without actually evaluating
the second argument.
Debugging statements are little snippets of text that print to the
screen when a certain part of your code is reached. Since SPL doesn't
have any support for strings, we have to add a language feature to support
this. Our approach will be that anything enclosed in double-quotes is
a debugging statement. This will require a new AST Stmt
node
class
called Debug
, which you will have to create.
Some specifics on the debug statements
- The debug statements can go in between other statements, but not in the middle.
So
write 5; "hello!" write 6;
would be OK, but notwrite "the interruptor!" 5;
. - Debug statements do not need to be followed by a semicolon.
- Because debug statements are going into your AST, they should be executed
every time that part of the program is executed. So for example, running
should not result in any output.if false { "toodaloo" }
Exercises
-
(OPTIONAL)
Get short-circuit evaluation working correctly for boolean operators.
Expressions like the following should then work:
write false and 1/0;
(would have given division-by-zero error before), and
write true or what;
(would have given undefined variable error before).
You can run (some of) the auto-tests yourself with this command:
roche@ubuntu$
./runtest.sh Ex12Test
-
(OPTIONAL)
Add in debugging statements as described above.
Because the ANTLR regex syntax is a little strange, we already have a token
for DEBUG statements in the SPLLexer.g4 file.
But you will still need add a new parser rule, a new StatementBuilder method,
and a new AST class, to get this working.
Each of these things should be short, but you really have to understand
the whole lab to get it done. It's well worth your time, I swear!
You can run (some of) the auto-tests yourself with this command:
roche@ubuntu$
./runtest.sh Ex13Test
write false and 1/0;
(would have given division-by-zero error before), and
write true or what;
(would have given undefined variable error before).
You can run (some of) the auto-tests yourself with this command:
roche@ubuntu$
./runtest.sh Ex12Test
Each of these things should be short, but you really have to understand the whole lab to get it done. It's well worth your time, I swear!
You can run (some of) the auto-tests yourself with this command:
roche@ubuntu$
./runtest.sh Ex13Test