Fortran

Useful Links

gfortran Wiki. Information on the GNU Fortran compiler can be found here, including detailed documentation, links to tutorials, etc.
Wikipedia page
The official ISO Fortran 95 standard.
99 bottles of beer program.

How I will run your code

Your code may be in "free form" Fortran, according to the Fortran 95 standard. However, you may not use preprocessing directives.

Save your program in a file called proj.f95 in folders called proj1 and proj2 respectively for phases 1 and 2 of your project.

I will test your code in the same environment as the lab machines in MI 302, using the commands

  /usr/bin/gfortran proj.f95 -o proj
  ./proj

Phase 2 Assignment

For phase 2 of your project, you will write a program to generate and then execute a bottom-up parser. You should submit your program in a folder called proj2, and be sure that is runs as described above.

Specification

Your program will take as input the specification of a CFSM as generated by bison. So that you don't have to spend all your time dealing with string manipulation and I/O in Fortran, I have written a little Python script that transforms the output from bison -v to a file called spec.txt that will be a much friendlier for parsing. The Python script is called fixoutput.py, and after downloading you can run it by typing

python fixoutput.py something.output

where something.output is the result of running a command like bison -v something.ypp.

For your project, you can assume that all the above has already happened and you have a file called spec.txt that contains the following, in order:

A series of lines of the form NUM NAME, where NUM is a non-negative integer and NAME is the name of the corresponding token. Actually, your program won't really need to know the names of the tokens since everything will be numeric, but this information might be helpful to you in debugging.
A single blank line.
A second series of lines of the form NUM NAME, where NUM is a non-negative integer and NAME is now the name of a non-terminal in the grammar. Again, your program won't really need to know the names of the non-terminals, but this might help in debugging.
A second single blank line.
A series of grammar rules. Each rule is numbered and is on a single line. The line starts with the rule number, then the number of the left-hand side non-terminal, then the numbers of the tokens and non-terminals on the right-hand side, in order.
A third single blank line.
A series of CFSM states. These state specifications consist of a line containing a single integer, which is the number of that state, followed by lines that indicate the actions of the state. Each action line starts with the number of the look-ahead token or non-terminal that will trigger that action. These lines have the following forms:
- X s Y, where X and Y are numbers and s is literally the letter s, meaning that if the look-ahead is X, shift and go to state number Y.
- X r Y, where X and Y are numbers and r is literally the letter r, meaning that if the look-ahead is X, reduce using grammar rule number Y.
- a, where a is literally the letter a, meaning that if we get to this state in the CFSM, we should accept the string as being in the language and return.
CSFM states are again separated by single blank lines.

An example of this file format is given below. Your program's job will be to read in this description of a CFSM and then actually perform a bottom-up parse according to that language specification.

The CFSM your program creates will simply be an array of states, indexed by their state numbers. Each state will itself be an array of actions to take. Each action might be yet another nested array. The details of this storage implementation are up to you. However, I suggest you store a list of symbol-action pairs, where each action is either a positive integer, meaning to shift and go to that numbered state, or a negative integer, meaning to reduce by that (negative) numbered rule. You will have to think of some special way of indicating the "accept" action.

After creating the CFSM, you should read a series of integers from standard in (which you may safely assume all correspond to tokens), and parse that string of tokens using the CFSM. This will involve maintaining a stack of symbol-state pairs as described in class, an indicator of the current (or next) state, and the saved value of the next "peeked at" token from the input. Your parser doesn't need to do any interpreting, but it should print out the tokens and non-terminals in the stack (as intgers) at every step in the process. These should be printed on a single line, separated by spaces, from the bottom of the stack to the top.

And of course your program should also identify and report any parse errors along the way. After reading the token 0, which will always correspond to $end, the EOF indicator, your program should halt. That is, you only have to parse a single stream of tokens.

Your parsing algorithm will simply involve a loop that looks at the next state number and the look-ahead token, and then looks up the corresponding action to perform. You should either produce a parse error if there is no action specified for that kind of token, or else perform the specified shift or reduce action. For a shift, this means adding the next token to the top of the stack and updating the current state. For a reduce, this means looking at the specified grammar rule, popping a certain number of symbols off the stack, adding a certain non-terminal to be the next "peeked at" symbol on the input stream, and finally updating the current state to be the saved state from the top of the stack.

Tips

I suggest you write your program in the following steps. Of course you are free to develop however you wish. As always, you are encouraged to submit every time you get some small step of the program working.

For starters, don't worry about reading the spec.txt file. Instead, start by making the global storage for the CFSM array, the stack, the next state, and the peeked-at input symbol.
Once you have the very basics set up from the first part, have your code manually create a very very simple CFSM with a single state that just shifts every token it sees and transitions back to itself. Run and test your program to make sure it just continually shifts tokens (strings) that you type in, and prints them out after each step on the stack.
Still manually creating CFSMs within your program, start adding a few more states to make sure you have the transitions correct. Next, add a single reduce rule with a non-terminal. Again, keep testing your program as you go along to make sure it works properly.
Once you are confident that your program works on a manually-created CFSM, go back and try to read in the CFSM description from the spec.txt file. I would suggest creating a spec.txt CFSM description that exactly matches a manually-created CFSM that you have working. Then, piece by piece, remove the parts of your program that specify some part of the parser, and instead read that input from the spec.txt file. Stop often for testing!
Once you think you have everything working, try more examples to make sure. Remember that any of the .ypp bison specification files that we have used in class and in labs can generate a .output file which (using the Python script) can be turned into a spec.txt file that your program should be able to read. Examine the stack contents at each step to make sure the bottom-up parse is executing correctly.

Style Requirements

You do not have to create your program in the exact way that I have suggested. However, for full credit your program should be well-documented and easy to follow. Every helper function you create should have documentation, and in general it should be easy for a non-Fortran programmer to see how your program works. You should also document how your data structures work, like the array of CFSM states.

Example

The file spec.output is bison's output for a very simple grammar for adding and subtracting numbers. If this file is in the current directory, along with the Python program fixoutput.py, then running the command

python fixoutput.py spec.output

Will produce the following spec.txt file:

0 $end
258 NUM
259 OPA

5 $accept
6 st
7 exp

0 5 6 0
1 6 6 7
2 6 7
3 7 7 259 258
4 7 258

0
258 s 1
6 s 2
7 s 3

1
258 r 4
259 r 4
0 r 4

2
0 s 4
258 s 1
7 s 5

3
259 s 6
258 r 2
0 r 2

4
a

5
259 s 6
258 r 1
0 r 1

6
258 s 7

7
258 r 3
259 r 3
0 r 3

This indicates the three tokens, three non-terminals, and 8 CFSM states of the parser. Then if I compile and run your program as above, then by typing the string

258 259 258 0

into standard in, your program should write

to standard out, and exit with status 0.

If I typed the string

258 259 259 0

instead, your program should write

258
7
7 259
Parse error!

and exit with nonzero exit code.