Today we will see how to use automated tools to generate a scanner and parser based on a token spec (regexes) and grammar that we provide, and then how to use the resulting parse tree to write an interpreter for the calculator language.
We will use these same tools for the interpreters and compilers we create in labs for the rest of the semester:
- A
Tokenizerclass that Dr. Roche wrote, which uses Java’s built-in regex library based on atokenSpec.txtfile that provides token names and regular expressions - ANTLR, an automatic parser generator for
Java that reads a grammar you write in a
.g4file, and generates Java source code for a working parser for that grammar. - Apache Maven, a tool used to organize and build large Java projects with dependencies.
Getting the code
Download calc.tgz.
Extract this tarball (in Linux) with the command:
tar -xzvf calc.tgz
That will create a folder calc with the following subfolders and
files:
-
src/main/resources/si413/tokenSpec.txt: Token specifications (names and regexes) for the calc language -
src/main/antlr4/si413/ParseRules.g4: ANTLR file with grammar rules for the calc language -
src/main/java/si413/Interpreter.java: The interpreter for the calc language that pulls it all together: runs the tokenizer and parser, then executes the resulting parse tree. -
calc-ex.prog: A simple example program in the calc language -
pom.xml: Configuration file for maven so it knows how to compile your code (using ANTLR) and download the dependenciesYou will not need to modify this file at all.
-
run.sh: Bash script to help running the interpreterYou will not need to modify this file at all.
-
src/main/java/si413/Tokenizer.java: Scanner using java regexes that reads its specs from thetokenSpec.txtresource file.You will not need to modify this file at all.
-
src/main/java/si413/Errors.java: Error handling code for the scanner and parser.You will not need to modify this file at all.
Note: Maven (and Java in general) is kind of annoying in
what folders things go in, as you can see from the names above.
These rules are strict, based on the package name si413 that we
will use for our code this semester. Pay attention to the folder names!
If you use an IDE, it will probably know about maven and be able to help
you.
Compiling and running
Go into the newly-created calc folder on the command line, and run
mvn package
That one command tells maven to:
- Download any needed Java or dependency libraries
- Make sure you are compiling and running under Java 17
- Run ANTLR to read the
ParseRules.g4grammar file and generate a parser for that specific language in a fileParseRules.java - Compile your java code with access to the various dependencies and generated code
- Put this all into a runnable jar file at
target/calc-1.0.jar
If the mvn command doesn’t work at all, on your VM, you can run
sudo apt install maven
to install it. Otherwise, ask your instructor if you run into issues.
To run the interpreter on the example program, you can mvn package to
get the jar file, and then run
java -jar target/calc-1.0.jar calc-ex.prog
But this process (packaging the jar, then running it) is kind of painful
to keep doing when you are developing code. I made a small bash script
run.sh to help that. By default, it compiles and then runs the main
method in the Interpreter.java class. So you can do
./run.sh calc-ex.prog
Understanding
The three crucial files which control the behavior of the interpreter
are tokenSpec.txt, ParseRules.g4, and Interpreter.java. Look
through these and make sure you understand how the pieces fit together.
When we run a program like
print(3*4)
what is happening in the tokenizer, the parser, and then in
Interpreter.java? Where (in code) does the actual multiplication happen?
Where does the actual printing occur?
Enhancements
Today in class you will get comfortable with this new build setup by making some changes to the language specs (tokens and grammar), and the interpreter itself (java file), and then compiling, running, and debugging using maven.
Try making the following enhancements to the calc language interpreter:
-
The main statement commands are
printandsave. To avoid unnecessary tying, allow shortened versions of these commands with a single character like!or$.(Should only require changing
tokenSpec.txt) -
Right now, parentheses can’t be used for grouping sub-expressions, like
print(2 * (3 + 4))Add this support for parenthesized expressions.
(Hint: you need to add a new grammar rule for
expr, and then add a new method in theExpressionVisitorsubclass insideInterpreter.java.) -
The parentheses used for print and save statements aren’t actually necessary in this language. Get rid of them, so a program like this will work:
print 1 + 2 save 17 - 5 print x * 2 + 20(Should require changing just one file - which one?)
-
Add an exponentiation operator
^, so that we can write an expression like3^4and get 81.