Haskell

Useful Links

GHC, the Glasgow Haskell Compiler, which we will be using.
Wikipedia page
The Learning Haskell page is chock-full of great (free) resources including books and tutorials.
99 bottles of beer program.

How I will run your code

Save your program in a file called proj.hs in folders called proj1 and proj2 respectively for phases 1 and 2 of your project. Be sure to include a main function.

I will test your code in the same environment as the lab machines in MI 302, using the commands

  /usr/bin/ghc --make proj.hs -o proj
  ./proj

Phase 2 Assignment

For phase 2 of your project, you will write a program to generate PREDICT and FOLLOW sets for a given grammar. You should submit your program in a folder called proj2, and be sure that it runs as described above.

Specification

Your program will read the grammar specification from a file called spec.txt. The format of this file is illustrated in the example below. Each line of the spec.txt file is a single production rule in the grammar, without anything fancy like | or *. Each rule is simply a non-terminal symbol, then a single colon, then 0 or more non-terminal symbols, until the end of the line. Non-terminals are specified with all lower-case names, and tokens with all upper-case names. Each non-terminal can appear as the left-hand side of multiple production rules, and they do not have to be all together (for instance, foo in the previous grammar). The non-terminal on the left-hand side of the first rule is defined to be the start symbol, no matter what it might be called. Notice that epsilon productions don't explicitly write "epsilon", they just have an empty right-hand side.

Your program should apply the rules for computing EPS, PREDICT, and FOLLOW sets according to this handout, repeatedly applying each rule until nothing more can be added to the sets. Then all the PREDICT and follow sets should be printed out, on single lines, with tokens separated by commas. The EOF symbol should be printed as $ as usual. The ordering that the sets are printed in does not matter.

Tips

I suggest you write your program in the following steps. Of course you are free to develop however you wish. In any case, I strongly recommend that you start with a small, working program, and then try to add more functionality in very small increments. As always, feel free to submit every time you complete some small step.

Start by just reading in the spec.txt file and storing each rule in a list of strings. Make helper functions to determine if a string is a token or non-terminal, and then check that each rule list starts with a non-terminal and contains only tokens and non-terminals.
Now you have to determine what kind of data structure you are going to store the sets in. Fortunately, there is a built-in Set datatype in Haskell that you can import from Data.Set. This would be good for storing EPS and each of the PREDICT and FOLLOW sets. You also need a way to find the PREDICT and FOLLOW sets for a given non-terminal quickly, so it would probably be a good idea to create two Maps (imported from Data.Map) mapping strings (non-terminal names) to sets (the PREDICT and FOLLOW sets). Once you figure out your data structures, initialize them to be empty, and then have your program do nothing and just print out the empty sets for each non-terminal.
Now start adding in rules. The first might be to automatically add $ to the FOLLOW set of the start symbol, and to add any non-terminal with an empty right-hand side to EPS.
Code up the more complicated rules for PREDICT and FOLLOW as actual Haskell functions. These functions should take a single production rule from the grammar (list of strings), and whatever the current sets are, and return modified sets based on the information in that production rule. So for instance one function will just check if the right-hand side of the rule starts with a token, and if so add this token to the PREDICT set of the left-hand side. Try to make these functions simple; they should just examine the single production rule and look for one case of things to add to PREDICT or FOLLOW.
The next step is just to apply your functions to the rules. First put all the functions from above into a list of functions - this is possible in a functional language like Haskell. Make sure the functions have exactly the same signature! Now make a higher-order function that takes this list of functions, and applies every function to every production rule from the grammar exactly once. You can get this going even before your functions from the previous step are complete - just add more functions to the list as you write them!
Finally, you need to apply every function to every rule repeatedly until nothing more is added to any set. Again, use functional programming for this! Make a function that calls the apply-all-functions function from the previous part, and then either calls itself again recursively if the sets have been updated, or returns the unmodified sets otherwise.

For full credit, your program should:

Be well-documented and easy to follow.
Embrace the functional programming paradigm. Even if you don't follow my steps above, your program should be ripe with higher-level functions, and probably use some anonymous functions ("lambda abstractions") as well.

Example

Here is an example spec.txt file:

start : BOO foo BAR baz
foo : baz
baz : 
baz : BOO foo BOO
foo : BAR BAR BOO

Given this file, your program should output the following sets, in any ordering:

PREDICT(start) = BOO
PREDICT(foo) = BAR,BOO,$
PREDICT(baz) = BOO,BAR,$
FOLLOW(start) = $
FOLLOW(foo) = BAR,BOO
FOLLOW(baz) = $,BAR,BOO