Smalltalk

Useful Links

GNU Smalltalk. The Smalltalk-80 interpreter that we will use. Contains links to documentation on the language as well as the interpreter and GUI.
Wikipedia page
smalltalk.org contains a number of useful links, including tutorials and guides as well as historical information.
99 bottles of beer program.

How I will run your code

You may want to use the GUI by running gst-blox for development. However, I will test your code by running the command-line version gst as described below.

Save your program in a file called proj.st in folders called proj1 and proj2 respectively for phases 1 and 2 of your project.

I will test your code in the same environment as the lab machines in MI 302, using the commands

  /usr/bin/gst proj.st

Phase 2 Assignment

For phase 2 of your project, you will write a program to generate and then execute a bottom-up parser. You should submit your program in a folder called proj2, and be sure that is runs as described above.

Specification

Your program will take as input the output from a call to bison -v that we saw in Lab 5. That is, we will let bison generate and describe the CFSM for bottom-up parsing in some language, and then your program will actually create this CFSM and attempt to parse a string of tokens in the language.

Specifically, your program should take as input a file called spec.output. The format of this file is the same as any other .output file produced when you run bison -v. The file contains, in order:

A numbered listing of the grammar rules in the language.
A list of all the strings that refer to terminals or tokens in the language, including the special EOF token $end.
A list of all the non-terminals in the language.
A numbered listing of each state in the CFSM. Each state listing contains a list of LR items, followed by a list of symbols and the corresponding transition or action. Transitions can be either shift transitions that say "go to state XX", or reduce actions that say "reduce using rule YY". The YY here refers to one of the grammar production rules numbered at the top of the file.

Your program must read in this spec.output file and create the CFSM that is described there. This CFSM will simply be an ordered array of states. You should make a new object type for a CFSM state that will contain information about the transitions and actions for that state.

After creating the CFSM, you should read a series of strings from standard in (which you may safely assume are all token names), and parse that string of tokens using the CFSM. This will involve maintaining a stack of symbol-state pairs as described in class, an indicator of the current (or next) state, and the saved value of the next "peeked at" token from the input. Your parser doesn't need to do any interpreting, but it should print out the symbols of the stack at every step in the process. These should be printed on a single line, separated by spaces, from the bottom of the stack to the top. So for instance a partial stack in parsing the Scheme language might be printed as

LP exprseq LP expr

And of course your program should also identify and report any parse errors along the way. After reading the token $end, your program should halt. That is, you only have to parse a single stream of tokens.

The CFSM state object should contain code to actually perform the parser actions. I suggest you store a list of symbol-action pairs in each state, where each action is either a positive integer, meaning to shift and go to that numbered state, or a negative integer, meaning to reduce by that (negative) numbered rule in the grammar. After peeking at the next token, the CFSM transition function should either produce a parse error if there is nothing to do for that kind of token, or else perform the specified shift or reduce action. For a shift, this means adding the next token to the top of the stack and updating the current state. For a reduce, this means looking at the specified grammar rule, popping a certain number of symbols off the stack, adding a certain non-terminal to be the next "peeked at" symbol on the input stream, and finally updating the current state to be the saved state from the top of the stack.

Tips

I suggest you write your program in the following steps. Of course you are free to develop however you wish. As always, you are encouraged to submit every time you get some small step of the program working.

For starters, don't worry about reading the spec.output file. Instead, start by making your object definitions and the global storage for the stack, next state, and peeked-at input symbol.
Once you have the very basics set up from the first part, have your code manually create a very very simple CFSM with a single state that just shifts every token it sees and transitions back to itself. Run and test your program to make sure it just continually shifts tokens (strings) that you type in, and prints them out after each step on the stack.
Still manually creating CFSMs within your program, start adding a few more states to make sure you have the transitions correct. Next, add a single reduce rule with a non-terminal. Again, keep testing your program as you go along to make sure it works properly.
Once you are confident that your program works on a manually-created CFSM, go back and try to read in the CFSM description from the spec.output file. I would suggest creating a spec.output CFSM description that exactly matches a manually-created CFSM that you have working. Then, piece by piece, remove the parts of your program that specify some part of the parser, and instead read that input from the spec.output file. Stop often for testing!
Once you think you have everything working, try more examples to make sure. Remember that any of the .ypp bison specification files that we have used in class and in labs can generate a .output file that your program should be able to read. Examine the stack contents at each step to make sure the bottom-up parse is executing correctly.

Style Requirements

You do not have to create your program in the exact way that I have suggested. However, for full credit your program should

Be well-documented and easy to follow. Every helper function you create and every class should have documentation, and in general it should be easy for a non-Smalltalk programmer to see how your program works.
Use the object-oriented programming paradigm encouraged by the Smalltalk language. By this I mean not just to use classes and objects (which Smalltalk forces you to do), but to actually think about object-oriented programming and structure your code accordingly.

Example

The file spec.output is bison's output for a very simple grammar for adding and subtracting numbers. If this file is in the current directory, and I compile and run your program as above, then by typing the string

NUM OPA NUM $end

into standard in, your program should write

NUM
exp
exp OPA
exp OPA NUM
exp
st
st $end

to standard out, and exit with status 0.

If I typed the string

NUM OPA OPA $end

instead, your program should write

NUM
exp
exp OPA
Parse error!

and exit with nonzero exit code.