Your code may be in "free form" Fortran, according to the Fortran 95 standard. However, you may not use preprocessing directives.
Save your program in a file called proj.f95
in folders called proj1
and proj2
respectively for phases 1 and 2 of your project.
I will test your code in the same environment as the lab
machines in MI 302, using the commands
/usr/bin/gfortran proj.f95 -o proj ./proj
For phase 2 of your project, you will write a program to generate and
then execute a bottom-up parser. You should submit your program in a folder
called proj2
, and be sure that is runs as described above.
Your program will take as input the
specification of a CFSM as generated by bison
.
So that you don't have to spend all your time dealing with string manipulation
and I/O in Fortran, I have written a little Python script that transforms
the output from bison -v
to a file called spec.txt
that will be a much friendlier for parsing. The Python script is
called fixoutput.py, and after downloading you
can run it by typing
python fixoutput.py something.outputwhere
something.output
is the result of running a command
like bison -v something.ypp
.
For your project, you can assume that all the above has already happened
and you have a file called spec.txt
that contains the following,
in order:
NUM NAME
, where NUM
is a non-negative integer and NAME
is the name of the corresponding
token. Actually, your program won't really need to know the names of the
tokens since everything will be numeric, but this information might be
helpful to you in debugging.
NUM NAME
, where NUM
is a non-negative integer and NAME
is now the name
of a non-terminal in the grammar.
Again, your program won't really need to know the names of the
non-terminals, but this might help in debugging.
X s Y
, where X and Y are numbers
and s is literally the letter s, meaning that if the look-ahead is X, shift
and go to state number Y.X r Y
, where X and Y are numbers
and r is literally the letter r, meaning that if the look-ahead is X,
reduce using grammar rule number Y.a
, where
a is literally the letter a, meaning that
if we get to this state in the CFSM, we should accept the string as being
in the language and return.An example of this file format is given below. Your program's job will be to read in this description of a CFSM and then actually perform a bottom-up parse according to that language specification.
The CFSM your program creates will simply be an array of states, indexed by their state numbers. Each state will itself be an array of actions to take. Each action might be yet another nested array. The details of this storage implementation are up to you. However, I suggest you store a list of symbol-action pairs, where each action is either a positive integer, meaning to shift and go to that numbered state, or a negative integer, meaning to reduce by that (negative) numbered rule. You will have to think of some special way of indicating the "accept" action.
After creating the CFSM, you should read a series of integers from standard in (which you may safely assume all correspond to tokens), and parse that string of tokens using the CFSM. This will involve maintaining a stack of symbol-state pairs as described in class, an indicator of the current (or next) state, and the saved value of the next "peeked at" token from the input. Your parser doesn't need to do any interpreting, but it should print out the tokens and non-terminals in the stack (as intgers) at every step in the process. These should be printed on a single line, separated by spaces, from the bottom of the stack to the top.
And of course
your program should
also identify and report any parse errors along the way. After reading
the token 0
, which will always correspond to $end
,
the EOF indicator, your program should halt. That is, you only have
to parse a single stream of tokens.
Your parsing algorithm will simply involve a loop that looks at the next state number and the look-ahead token, and then looks up the corresponding action to perform. You should either produce a parse error if there is no action specified for that kind of token, or else perform the specified shift or reduce action. For a shift, this means adding the next token to the top of the stack and updating the current state. For a reduce, this means looking at the specified grammar rule, popping a certain number of symbols off the stack, adding a certain non-terminal to be the next "peeked at" symbol on the input stream, and finally updating the current state to be the saved state from the top of the stack.
I suggest you write your program in the following steps. Of course you are free to develop however you wish. As always, you are encouraged to submit every time you get some small step of the program working.
spec.txt
file.
Instead, start by making the global storage
for the CFSM array, the stack, the next state, and
the peeked-at input symbol.
spec.txt
file. I would suggest creating a
spec.txt
CFSM description that exactly matches a
manually-created CFSM that you have working. Then, piece by piece, remove
the parts of your program that specify some part of the parser, and instead
read that input from the spec.txt
file. Stop often for
testing!
.ypp
bison specification files
that we have used in class and in labs can generate a .output
file which (using the Python script) can be turned into a
spec.txt
file
that your program should be able to read. Examine the stack contents
at each step to make sure the bottom-up parse is executing correctly.
You do not have to create your program in the exact way that I have suggested. However, for full credit your program should be well-documented and easy to follow. Every helper function you create should have documentation, and in general it should be easy for a non-Fortran programmer to see how your program works. You should also document how your data structures work, like the array of CFSM states.
The file spec.output is bison's output for a very simple grammar for adding and subtracting numbers. If this file is in the current directory, along with the Python program fixoutput.py, then running the command
python fixoutput.py spec.output
Will produce the following spec.txt
file:
0 $end 258 NUM 259 OPA 5 $accept 6 st 7 exp 0 5 6 0 1 6 6 7 2 6 7 3 7 7 259 258 4 7 258 0 258 s 1 6 s 2 7 s 3 1 258 r 4 259 r 4 0 r 4 2 0 s 4 258 s 1 7 s 5 3 259 s 6 258 r 2 0 r 2 4 a 5 259 s 6 258 r 1 0 r 1 6 258 s 7 7 258 r 3 259 r 3 0 r 3
This indicates the three tokens, three non-terminals, and 8 CFSM states of the parser. Then if I compile and run your program as above, then by typing the string
258 259 258 0
into standard in, your program should write
258 7 7 259 7 259 258 7 6 6 0
to standard out, and exit with status 0.
If I typed the string
258 259 259 0
instead, your program should write
258 7 7 259 Parse error!
and exit with nonzero exit code.