Lab 7: Abstract Syntax Trees
This lab is due at 2359 on Wednesday, 20 October. It should be submitted to the course SI413 as Lab07, with the files named properly according to the lab instructions. See the submit page for details. You are highly encouraged to submit whatever you have done by the end of this lab time. Remember that you must thoroughly test your code; the tests that are visible before the deadline are only there to make sure you are using the right names and input/output formats.
1 Starter Code
This lab starts a new group of labs. This means two things. First, you need to choose a new lab partner and make sure to notate that on the shared Google sheet.
Second, you need to create a new repo for your shared work running the following command. (One lab partner should run this command first to create the shared repo, and then the other lab partner should run it next to clone the repo.)
roche@ubuntu$
413repo spl
If that command doesn't work, check the initial setup instructions.
You should now have a folder like
si413/spl/lab07
with the starter code for this lab and a handy script to submit your code.
You can also browse the starter code from the Gitlab page (USNA intranet only).
2 Introduction
From today's lab all the way until the end of the semester, we will be working on interpreting and compiling a made-up programming language called SPL ("Simple Programming Language"). Here's how it'll happen:
- Lab 7 (today!): Interpreting an Abstract Syntax Tree.
Gaining familiarity with OOP in C++ and the AST structure allows us to interpret all language features except functions, using a single global scope. - Lab 8: Functions and Lexical scoping with frames.
We will implement the one feature missing from today's lab - functions! This requires us to implement lexical scoping using activation frames. That allows our language to support advanced functional programming features using closures, like we saw in Scheme. - Lab 9: Type checking.
We will add both dynamic and static type-checking features to our interpreter to support richer feedback and reliability for the SPL programmer. - Lab 10: Garbage collection.
Automatic garbage collection will be added to our interpreter using the mark-and-sweep paradigm, to prevent memory leaks and make our interpreter more robust. - Labs 11 and 12: Compiler.
Finally, in this exciting double lab, we will foray briefly into actual compilation by producing LLVM intermediate expression code, the same thing that is used by the popular clang/clang++ compilers to compile C and C++ programs. As a result, you will have a complete working compile that turns a .spl file in our made-up language into an actual executable.
3 OOP in C++
We will be using object-oriented programming in C++ to write the SPL interpreter. Luckily, most of the class structure is already provided by me, and you just have to fill in some of the methods. But here's a summary of how OOP works syntactically in C++. (You should already understand how it works semantically from Java!) All of the examples below are taken from the ast.hpp file from today's starter code.
- Classes are declared using the
class
keyword. Superclasses are specified using a colon (like theextends
keyword in Java). For example
declares a class calledclass Stmt :public AST { /* fields and methods in here */ };
Stmt
that is a subclass ofAST
. Thepublic
keyword is necessary and tells us that the methods inAST
can be accessed from aStmt
object by anybody.
And don't forget that class declarations must end with a semicolon in C++! - Fields and methods in a class can be either
private
,protected
, orpublic
. We specify these by grouping declarations with the same visibility and then writing visibility specifiers before them, likepublic:
. - If we want to use polymorphism in a class method, we declare the method
with the
virtual
keyword, like
This allows sub-classes to override that method in the way that we expect from Java. (In Java, essentially every method is declaredvirtual void exec() {...}
virtual
.) - There is no special syntax for interfaces, nor is there an
abstract
keyword as in Java. Instead, to declare a method abstract (i.e., it must be overridden by subclasses), we write= 0
instead of the method body, likevirtual void ASTchild(AST* child) = 0;
- When a virtual method is overridden in a base class, the
overrides
keyword can be used to tall the compiler that you really intend to be overriding a base class method, like:
Thevoid exec() override { ... }
override
keyword is actually optional, but it's really useful because you'll get an error message if there is no matching base class method (for example, if you get the signature wrong by accident). - Constructors work similarly to Java: a constructor is a special method
with no return type, and whose name matches the name of the class.
Something different in C++ is the presence of destructors,
which are special methods with no return type, no parameters, and whose
name is a tilde character `~` followed by the name of the class.
Destructors in C++ are responsible to "clean up" when an object is being destroyed.
(You shouldn't have to change any descructors for this lab, but it's useful
to be aware of what these things are.)
Destructors are almost always declaredvirtual
, to make sure they get called on subclasses too.
Here is an example of a class with a no-argument constructor as well as a destructor.class AST { // ... various other stuff ... public: /* Makes a new "empty" AST node. */ AST() { nodeLabel = "EMPTY"; } /* Deallocates memory for this AST note and its children. */ virtual ~AST() { for (AST* child : children) delete child; children.clear(); } };
- Methods can be defined within the class declaration, like in Java.
But if we did this all the time, especially when there are many classes defined
in the same file, the file gets pretty cluttered and hard to read.
So instead of defining methods within theclass
declaration, we can just declare the method's prototype there, and fully define the method in some.cpp
file later.
For instance, the following class definition appears in ast.hpp:
Notice that there is no function body for theclass ArithOp :public Exp { public: Value eval() override; // ... other fields and methods ... };
eval
method. That is saved for the ast.cpp file:
Since this is outside the class declaration, we must specify the name of the class in front of the method name and use the scope resolution operatorValue ArithOp::eval() { int l = left->eval().num(); int r = right->eval().num(); switch(op) { case ADD: return l + r; case SUB: return l - r; case MUL: return l * r; case DIV: if (r != 0) return l / r; else if (!error) { error = true; errout << "ERROR: Divide by zero" << endl; } return Value(); default: return Value(); // shouldn't get here... } }
::
likeArithOp::eval
.
There is no set rule about when a method should be declared in the cpp file versus the hpp file, but it's probably a good idea once the method body gets longer than 4 lines or so. Another good rule of thumb is to try and keep the class declaration (in the hpp file) on one "screenfull", which will be roughly 50 lines depending on your font and editor.
4 The SPL language
SPL is an extension of the calculator language we have been using to include a few control structures as well as simple calculator instructions. The syntax is mostly similar to C++ and Java but much more limited, and there are some very important differences (as we will see). Specifically, SPL supports:
- Integers like
1234
and booleanstrue
andfalse
- Basic arithmetic (
+
,-
,*
,/
) - Numeric comparisons (
=
,!=
,>
,>=
,<
,<=
) - Boolean operations (
and
,or
,not
) - Input/output (
read
andwrite
) - Basic control structures (
if
,ifelse
, andwhile
) - Variables, declared with the
new
keyword and assigned using the:=
operator. - User-defined functions. These must be unary (single-argument),
and are defined with
lambda
statements.
NOTE: We will deal with functions in all their glory in later labs, but for this lab you don't have to worry about lambdas or function calls.
A scanner and parser for SPL have already been written for you; they are in the files spl.lpp and spl.ypp. Check them out for the full details of the SPL syntax. Here's an overview of the grammar:
res: stmt | block: LC stmtlist RC stmtlist: stmtlist stmt | stmt: NEW ID ASN exp STOP | ID ASN exp STOP | WRITE exp STOP | IF exp block | IFELSE exp block block | WHILE exp block | block exp: exp BOP exp | NOTTOK exp | exp COMP exp | exp OPA exp | exp OPM exp | OPA exp | READ | LAMBDA ID block | exp FUNARG exp | LP exp RP | ID | NUM | BOOL
You can see the token names in the grammar above. First we
have the standard punctuation: LC and RC are curly braces, LP and
RP are parentheses, and STOP is a semicolon. ID is a variable name
(which can contain letters and digits), NUM is an series of digits,
and BOOL is either true
or false
.
The operators of the language are defined by these tokens:
Token name | Operator(s) -----------|------------ ASN | := BOP | and or NOTTOK | not COMP | = != < <= > >= OPA | + - OPM | * / FUNARG | @
Finally the keywords in SPL are:
new write if ifelse while read lambda true false
Here are a few interesting aspects of SPL that you should be aware of:
- There are no declared types. Variables are declared and given an
initial value with a statement like
and afterwards, they can be reassigned using just thenew x := 10;
:=
operator without thenew
keyword. - The conditions in
if
andwhile
statements do not need to be enclosed in parentheses. For example, here is a program to find (and then print) the sum of 1 up to 100:new i := 1; new sum := 0; while i <= 100 { sum := sum + i; i := i + 1; } write sum;
- A regular if statement (with no else) looks like you might expect:
But if we have an if-else, you use the keywordif 1 < 10 { write true; }
ifelse
instead ofif
, followed by the condition and the two code blocks - no separateelse
keyword is used:ifelse false { write 100; } { write -100; }
- Function calls are a bit different than the norm. We won't really have
to deal with these until next week, but here's a primer. Every function
is a unary function (takes one argument), and return values are specified
by assigning to the special variable
ret
. So here is a function to compute the factorial of its argument:
(Did I mention? Pound signs# This is a one-argument function to compute factorials. new fact := lambda n { new prod := 1; new i := 1; while i <= n { prod := prod * i; i := i + 1; } ret := prod; };
#
start single-line comments.)
Once we have a function defined, you call it with the@
operator, likefunction @ operand
. Using the function just defined, we could print out 8 factorial with:write fact@8;
Exercises
-
Write an SPL program (without any lambdas)
that reads in integers until
0
is entered, at which time the sum of all the integers read in is printed to the screen.
Submit your program in a file calledex1.spl
. And call over your instructor to go over the AST that results when you run your program through the existingspl
program from the starter code. Note: you should enclose your program in a single block with curly braces so that it prints as a single AST.
5 The SPL interpreter
As you may have noticed, the starter code for today's lab is fairly extensive. This is intimidating, because you're going to be modifying a piece of software that you didn't create and that you might not fully understand yet. But the good news is that you don't have to write everything from scratch! I suggest you dive in with the exercises below and don't worry about parts of the code until you need to. As always, ask your instructor as you encounter difficulties!
Here's an overview of how the interpreter works. This is a process you should be pretty familiar with by now, although you haven't seen it all in working C++ code before:
- The scanner (spl.lpp) reads in the SPL program and breaks it up into tokens.
- The parser (spl.ypp) organizes these tokens according to the grammar rules. As a side-effect to parsing, an Abstract Syntax Tree (AST) is created.
-
Every node in the AST is a pointer to an object of a subclass of
AST
. The two main subclasses areStmt
for statements andExp
for expressions. EveryStmt
subclass has anexec()
method to do whatever it is that statement does, and each subclass ofExp
has aneval()
method that returns the value that that expression computes to. - The top node in the AST (which must be a
Stmt
) is executed, which recursively causes other statements to be executed and various expressions to be evaluated. All these methods are declared in the ast.hpp file.
Your task in this lab is to write all those exec
and eval
methods, except for the ones having to do
with lambdas and function calls. You can see all the types of statements
and expressions by looking at the top of the
ast.hpp file (around line 45).
Each one of those classes needs a shiny new exec()
or eval()
method, except for the following which I've
already written for you: NullStmt
,
Write
, Num
, and
ArithOp
.
The exercises below give a nice ordering and hints on completing the implementation. Good luck!
6 Operations
Since Num
, ArithOp
, and Write
have already been written for you, simple SPL programs that do some
light number crunching work already with the starter code:
spl>
write 10;
10
spl>
write 5 + 3;
8
spl>
write 18 - 15 / 3;
13
But alas, none of the following examples work:
spl>
write -(5 + 5);
-10
spl>
write 4 > 3;
true
spl>
write 11 = 12;
false
spl>
write true;
true
spl>
write true and false;
false
spl>
write false or true;
true
spl>
write not true;
false
spl>
write 6 + 3 != 2 and not 6 + 3 <= 2;
true
So fix it!
This requires implementing the eval
methods in the classes mentioned below.
For each of these,
you will have to declare a new method in the class with a line like
Value eval();
You can then either define the method right there in the ast.hpp
file, or externally in the ast.cpp
file. Remember, external
definitions require you to specify the class name, like
Value ArithOp::eval() { /* code here */ }
Notice that the eval
method in ArithOp
has
a switch
statement depending on which operator is
being used. Look in the class declaration for how the operator is stored,
and see the top of ast.hpp
for the enum type Oper
that you will need to use.
Remember that an operation like +
must evaluate its arguments
first before using them!
This is accomplished by calling the eval
method recursively.
For instance, the first argument to an ArithOp
is evaluated
by calling left->eval()
.
Look at the definition of the Value
class in value.hpp
.
This is the type returned by the recursive calls to eval()
.
You will be interested in using the num()
method, which returns
the integer value that is stored in the object. (The tf()
method returns the bool value instead.)
Also note that we have constructors from the basic
types, so for example to return a numerical value of 8, you would write
something like return Value(8);
.
A line like this constructs and then returns a Value object holding the integer
(or boolean) value that it's given.
Tip: once you get your interpreter going, it will start to get really annoying having the AST pop up on the screen every time. To get rid of it, just change the line
bool showAST = true;
in the main method of spl.ypp
to false
.
This will ensure that the AST still gets generated
and saved in spl.pdf
, but you don't have to look at it unless
you need to.
Exercises
-
Write the
eval()
method inNegOp
, so that unary negation with the-
sign works. -
Write the
eval()
method inCompOp
, so that integer comparison operations like5 != 12
work. -
Write the
eval()
method inBoolExp
, so that boolean constants likefalse
work. -
Write the
eval()
methods inBoolOp
andNotOp
, so that theand
,or
, andnot
operators work.
7 Variables
Look at the SymbolTable
class in the starter code file
st.hpp.
There is a global SymbolTable object ST
that will
be used to implement a single, global scope. (Don't worry! We'll implement
proper scoping in the coming labs.) Read and understand what
the three public methods of the class do.
Exercises
-
Get variable declaration and assignment working using
new
and:=
. This means implementing theexec()
methods in classesNewStmt
andAsn
.
You should print helpful error messages and set theerror
flag to true if a variable is declared twice (twonew
's) or if it's assigned before it's declared. For example:
(You will have to use the methods of thespl>
new x := 5; # This works fine
spl>
x := 20; # This is fine too
spl>
y := 20;
ERROR: Can't rebind y; not yet bound!
spl>
new x := 30;
ERROR: Variable x already bound!
SymbolTable
class inst.hpp
. You might have to change some methods in that class to get all the error handling working correctly.) -
Get variable reference working in the
spl
interpreter program. A program like
should work and producenew x := 5; x := x + 10; write x;
15
.
This means implementingeval()
in theId
class.
8 Control Structures
It looks like there are three control structures in SPL:
if
, ifelse
, and while
.
But actually these are only two, because an if
and an
ifelse
statement in the code both become a single
IfStmt
object in the AST. The only difference is that
the else block might be NullStmt
if the first syntax
was used.
To get these structures working, you first need to implement the
exec()
method in the Block
class.
You may have noticed that blocks with curly braces produce valid ASTs,
but still make the interpreter give up. Well, a bunch of statements surrounded
by curly braces is a Block
, and that's where you will start.
Exercises
-
Get blocks of statements in curly braces working.
This means writing the
exec()
method in theBlock
class, so that a program like
works correctly and prints out 100. HINT: This should be a two-liner!{ new var := 10; var := var * var; write var; }
-
Get
if
s andifelse
s working. This means writing theexec()
method in theIfStmt
class, so that a program like
works correctly and prints out false.ifelse 40 < 11 { write true; } { write false; if 2 != 2 { write 2; } }
-
Get
while
statements working. This means writing theexec()
method in theWhileStmt
class, so that a program like
works correctly and prints outnew i := 3; while i > 0 { write i; i := i - 1; }
3
2
1
9 Last one: read
Okay, now just about everything is working except for functions
(which we'll deal with next class) and read
.
In SPL, read
is an expression, which means that it
returns a value (namely, whatever the user types in).
Reading input can create some awkward problems because your actual
code (the SPL program that is being interpreted) is coming from stdin, but
so is the input to that program when you have read
expressions. To help make this a little less confusing, print out a
read>
prompt when the read
statement
executes. Then a run of your interpreter might look like:
spl>
new secret := 42;
spl>
new numtries := 0;
spl>
while read != secret { numtries := numtries + 1; }
read>
15
read>
-3
read>
21
read>
42
spl>
write numtries;
3
Exercises
-
Get the
read
command working. Typingnew x := 4; x := read;
into your interpreter should cause it to wait until an integer is typed in. Then writingwrite x;
should echo that typed-in integer back to the screen. This means writing theeval()
method in theRead
class. -
Now your interpreter should have everything you need to
run your program from Exercise #1! If you give a filename as a command-line
argument to the
spl
interpreter, it reads and executes SPL code from that file instead of reading in the code from the terminal. So you should be able to do:
There is nothing to hand in or to test for this part but rest assured that I will test it when you submit your code! (Plus it should feel damn good to get this working.)roche@ubuntu$
./spl ex1.spl
read>
5
read>
6
read>
12
read>
0
23
10 Two OPTIONAL extensions
If you've gotten this far, great job! Let's add two more nice features that will make programming more pleasant: short-circuit evaluation, and debugging statements.
Short-circuit evaluation means cutting off a boolean expression
with and
s and or
s as soon as we know the answer.
For instance, since
false and XXXX
must be false no matter what XXXX
is,
we can just return false
from this without actually evaluating
the second argument.
Debugging statements are little snippets of text that print to the
screen when a certain part of your code is reached. Since SPL doesn't
have any support for strings, we have to add a language feature to support
this. Our approach will be that anything enclosed in double-quotes is
a debugging statement. This will require a new AST Stmt
node
class
called Debug
, which you will have to create.
Some specifics on the debug statements
- The debug statements can go in between other statements, but not in the middle.
So
write 5; "hello!" write 6;
would be OK, but notwrite "the interruptor!" 5;
. - Debug statements do not need to be followed by a semicolon.
- Because debug statements are going into your AST, they should be executed
every time that part of the program is executed. So for example, running
should not result in any output.if false { "toodaloo" }
Exercises
-
(OPTIONAL)
Get short-circuit evaluation working correctly for boolean operators.
Expressions like the following should then work:
(would have given division-by-zero error before), andwrite false and 1/0;
(would have given undefined variable error before).write true or what;
-
(OPTIONAL)
Add in debugging statements as described above. Besides the new
Debug
class inast.hpp
, you will also need a new token type, a new scanner rule, and a new parser rule. But it'll be worth it, I swear!