You may work in pairs for this lab. If you choose to do so, only have one partner submit the required files electronically. Be sure to include both partners' names at the top of every submitted file.
This lab is due at the beginning of your next lab period. You should submit a folder called lab03 containing all of the following files:
yylex()
functions for
bison-generated
parsers. Flex and bison are GNU versions of lex and yacc,
which are traditional Unix development tools.
They both have manuals online:
flex manual,
bison manual.
Download the following files into your lab03 directory:
These three files define a simple calculator program. In fact, it's the same simple program we had from Class 7. The big difference is that now, instead of writing our scanner by hand, we're using the standard tool flex to create it.
Look at the Makefile. We start off with the file calc.ypp that defines the grammar and action rules for our calculator. The call
bison -d calc.yppcreates files
calc.tab.cpp
and
calc.tab.hpp
. The first contains the C++ code for
the parser (which is a function called
yyparse()
), the second is a header file that
contains, among
other things, the definitions of the token names NUM, OPA,
etc., and the variable yylval
that we use to pass
the bison code the semantic values of tokens. The flex file
calc.lpp
needs those definitions so that the
yylex()
function it defines for bison can pass
back the information it needs to pass back.
The next thing we have is the flex file calc.lpp
.
The call
flex -o lex.yy.cpp calc.lppcreates the file
lex.yy.cpp
that contains, among
other things, the definition of the yylex()
function. Of course we haven't talked about flex, so this file
is new. The basic idea is that you write regular expressions
for the different token types, and actions for each regular
expression. The flex generated scanner code tries to match
characters from the current input stream to these regular
expressions, and when a match is found, it executes the
associtated action code. The variable yytext
contains the string (in the C sense, i.e. '\0' terminated
char*.) of characters that were matched. When more than one
match is possible it breaks ties by going with the longest
match --- i.e. it follows the "maximal munch" rule. When
there are several "longest" matches, it goes with
the regular expression listed first in the flex file.
Of course, you should realize that flex is simply creating the
same program implementing a finite automaton that we talked
about in class. The difference is that we users specify
tokens by regular expressions. So flex has to convert the
regular expressions into finite automata for each token type,
which are then combined into a single finite automaton. This
is then converted, automatically, into a program.
make
and confirm that the mycalc program works
as intended.
For the remainder of this lab, you will add features to the simple
calculator program. These features will involve significant additions to
the scanner (calc.lpp
), and some to the parser
(calc.ypp
) as well.
The features you will implement are listed as A-J below. Everyone must implement features A-E. You must also implement at least two of the features F-J.
Next week, each submission will be (anonymously) reviewed by your classmates. The goal of this peer review will be to judge whether all features have been implemented correctly and completely. When developing your lab this week, think about how you would rigorously test every feature. Part of your lab grade will be based on the quality of your review.
To facilitate this review, you must list every feature that you implement in the features.txt file. An example file is here: features.txt. Since I have to process these files very quickly at the beginning of next week's lab, you must follow the following formatting rules:
%%
, and ending
with a completely blank line. The first line (after the %%
)
should give the name of that feature, and the following lines
include any description or details of how that feature works in
your program. For instance,%%Feature A Here are the details for how feature A works in my program. This is a feature that was SUCCESSFULLY IMPLEMENTED. This part can contain any amount of text, but no blank lines. The blank line below ends this block.
$$
instead.
You will be reviewed by your peers and by your instructor on all features
that you worked on, whether successfully implemented or not. You will get
the highest grade for features that you claim are successfully implemented,
and which actually are. The next highest grades will be for features that
are almost complete, and which you acknowledge (with $$
)
are not completed. Lower grades will be given to features which are found
to be buggy, but which claim to be successfully implemented.
Modify the scanner so that it accepts integers in either decimal or binary. Specifically, a string of only 0's and 1's that ends in the letter "b" (with no whitespace in between) should be interpreted as a binary integer. For example:
> ./mycalc 10 + 3; 13 10b + 3; 5
This should only require changing the flex scanner specification
file. The cstdlib
function strtol
will be
useful here (check it out with man strtol
).
To convert a binary integer stored in the C string called
str
, you can write strtol(str, NULL, 2)
.
// print n as a signed binary number
void printsbin(int n)
{
if (n < 0) { cout << "-"; n = -n; }
char res[33] = { '\0' };
int i;
for(i = 31; n != 0; --i, n /= 2)
res[i] = (n & 1 ? '1' : '0');
cout << (i == 31 ? "0" : res + i + 1);
}
Now this is going to require modifying the bison file
calc.ypp
as well as the flex file. It
should be a simple change if you understand the system,
though:
you need to define a new token (I might call it STOPB,
or something like that), you need to modify your flex
file to recognize the new token, and you need to create
a new rule in the grammar to use the token.
> ./mycalc 10+3*5; 25 10+3*5;b 11001 10b+3*5;b 10001 10b+3*5; 17
/* like this */
are not completely straightforward with flex. Why? Well,
suppose you added the flex rule:
"/*".*"*/" { }This doesn't work because
.*
doesn't capture
\n
's, and the whole point of C-style comments
is to allow multi-line comments. So we make it like this:
"/*"(.|\n)*"*/" { }This doesn't work either, and the problem is more fundamental. Remember that the rule is that the longest match possible is the one flex makes, so in
/* start */ x = 4; /* end */flex thinks the whole string is a comment. In this case, we don't want maximal munch! (We have the same problem with string literals.) So we'll take the easy way out and use another common comment convention: A "#" character starts a comment and the comment extends to the next newline character. Here's an example session:
> ./mycalc 1+2+3+4+5+6+7+8+9+10; # sum 1..10 computed explicitly 55 10*(10+1)/2; # sum 1..10 computed by Gauss's closed form 55Note that although the newline is part of the comment, the comment does separate tokens. So
1000# beginning of one million 000# end of one millionis interpreted as 1000 followed by 0, which gives an error. Not a lexical error, though, rather a parse error!
One of the real advantages of using a tool like flex as opposed to writing your scanner by hand is that adding new tokens usually has no effect whatsoever on what you've already done. You simply add a new line to your flex file. When you do this by hand, that's not the case: a new token may change the underlying finite automaton substantially.
Our calculator is getting a bit more interesting. One thing it doesn't allow, though, is variables. So let's add variables. We'll start with just a single variable called an accumulator. Observe that there are two ingredients to supporting a variable:
res: SETACC exp STOPto the grammar, along with a new token type
SETACC
. This token should correspond to
some new operator that we choose; let's use a single exclamation point
"!
". This statement will set the
value of the accumulator to the result of exp
.int
somewhere
in your code. Should it be in the (bison) parser code or the
(flex) scanner code?@
, and make a new token called
ACC
. Then we also need to add this into the grammar:
ACC
should be another expansion of the nonterminal
factor
.> ./mycalc ! 5; @; 5 ! 3 * 2; @; 6 @ + 4; 10You may decide to use symbols other than
!
and @
,
but be sure to document this in your features.txt
file if you do.
Now let's allow many variables instead of just one. This will involve creating a symbol table to map variable names to values.
We won't require any declarations; having a name on the left-hand side of an assignment will be enough to create it. Syntax-wise, we have some decisions to make:
:=
, =
,
to
as in x to 1
,
and <- as in x <- 1
.
The last one is going to be problematic for us (why?).
We'll use ":=
" for these examples. If you choose to
use something else, be sure to document it in
features.txt.
[a-zA-Z_][a-zA-Z0-9_]*You can make whatever restrictions you like on variable names, as long as they include normal, alphabetic names such as
var
.
Document the allowable variable names in your
features.txt file.
Semantically, we have a few decisions as well:
res: NAME ASN exp STOP
, where NAME and ASN
are new tokens.
Finally, we have some technical implementation issues to worry about.
The C++ Standard Template Library (STL) is an extremely powerful language feature ... not always super easy to use, but very powerful. One of its most useful "containers" is the map. A "map" is essentially the abstract data type the red-black tree was invented to implement. It's a lot like an array that you don't need to index by consecutive integers, a structure called an associative array. Here's an example that uses a map mapping C++-style strings to int's:
#include <string> #include <iostream> #include <map> using namespace std; int main() { map<string,int> T; string s; for(int i = 1; cin >> s && s != "done"; ++i) T[s] = i; cout << endl; cin >> s; if (T.find(s) == T.end()) cout << "String " << s << " was not entered." << endl; else cout << "String " << s << " was the " << T[s] << "th string entered." << endl; }
A sample run of this program:
foo bar star cat done cat String cat was the 4th string entered.
Note that you must #include <map>
to
use maps. We can simply declare our symbol table as a map
in the initial declarations section of the bison file.
The way this works is that T
is initially empty, but as
soon as you index it with something, like
T[string("the")]
,
the mere act of indexing creates
an entry with key "the"
. One thing about
this is that you can't really test to see whether
there's already an entry for "the"
this
way.
The way you make that test is:
if (T.find(string("the")) == T.end()) ...
.
string
(i.e. C++-style
string) in the %union
statment, but that
doesn't work. Bison is unable to handle any kind of
object there that has a constructor. So ... we will pass
around C-style strings instead. So you'll add something
like
%union { int val; char sym; char *str; };to your bison file so that you can handle semantic values that are C-style strings. One pitfall: in your flex file,
yytext
is a C-style string giving the text
that matched the token's pattern. However, you can't just
assign that to yylval.str
, because flex may
use yytext's memory for later scanning. Instead you need
to make a copy of yylval.str
. The
strdup
function from C's string library
(that'd be #include <cstring>
in C++)
does that for you. You can check it out with man
strdup
. This approach of passing C-style strings
as semantic values is fine for this lab, but it has some
long term issues, since we may "lose" strings before we
have a time to free the memory they use.
:=
for assignment, we should be able to
do something like:> ./mycalc x := 7; y := 93; # Now we compute the sum x + ... + y y*(y + 1)/2 - (x-1)*x/2; 4350 # Now in binary! y*(y + 1)/2 - (x-1)*x/2;b 1000011111110
You are required to implement at least two of the following "extra" features. Be sure to document what you do in features.txt.
"a comment"
, then we
should be able to do things like:
> ./mycalc 5 + 3; "A comment" 8 5 + "Another comment" 3; 8
> ./mycalc var := 10; VAR; 10 vAR := 11; var; 11
int
s, so still no decimal
points and the exponent can't be very big. It's up to you to specify the
exact syntax, but this should not require any new grammar rules.
For instance,
> ./mycalc 21e2; 2100 5e6; 5000000