Project 1

This is the archived website of SI 335 from the Spring 2014 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

Due Date: Monday, February 10

This project contains both written and electronic (programming) parts. The electronic parts (code) must be submitted according to the instructions on this page. The written parts must be typed or neatly written, and turned in in to the instructor along with a cover sheet. The cover sheet must contain your name, the name of the course SI 335, and the name of the project, Project 1.

1 Service Selection at the OMA

The year is 1984 and the world is ruled by just a few super-states. One of them, Oceania, is dominant and controls both North and South America, sourthern Africa, and Australia. It is engaged in a perpetual, unwinnable war with the other super-powers for the purpose of subjugating its own citizens.

To support the perpetual war, the Oceania Military Academy (OMA) churns out an amazingly high number of graduates every year (in the millions), and the process of selecting which part of the military service they will each enter is a daunting task. This being Oceania, the students themselves (called Proles) have no say in their own service selection. Instead, they are all ranked by each of the different communities, who then take turns in picking which Proles they will get.

2 Particulars

Your task is to design an algorithm and write a computer program that reads in the names of the k different communities, as well as the n Proles and their rankings in each community, and prints the results of the service selection. This works similarly to a draft: the k communities, in the order they were read in, take turns picking, one at a time, until there are no more Proles left. For each pick at each iteration, the community chooses the highest-ranking Prole (according to its own ranking) that has not been picked yet.

Specifically, the input, which should be read from standard in, contains in order

The integer k
k strings (each without any spaces) for the community names
The integer n
n lines, each containing a string (with no spaces) for the Prole's name, followed by k positive integers, for the rankings of that Prole by each community.

3 Example

For example, consider the following sample input file:

3 
Tanks 
Infantry 
Paratrooper 
5 
Bill 4 5 1 
Suzy 1 1 4 
John 2 4 5 
Kate 3 2 3 
Mike 5 3 2

The Tanks community will choose Suzy first. Then the Infantry would like to choose Suzy but can't, so they pick Kate instead. Next the Paratroopers pick Bill, and so on. The correct output would be

Suzy Tanks 
Kate Infantry 
Bill Paratrooper 
John Tanks 
Mike Infantry

4 Provided Starter Code

The following programs are available to help you:

You can get all these files at once by downloading the file proj1.tar.gz and running tar xzvf proj1.tar.gz

I have written a simple algorithm to solve the service selection problem and implemented it in Python, C++, and Java. Those are the selection.X files above. Start by reading over one or all of these to see how my algorithm works. Note: this is a really slow way to solve the problem! Your job will be to improve it, of course.

The other three files are to help you generate test cases. The text files jobs.txt (Marine MOS codes from the Vietnam era) and names.txt (All first names with frequency at least 5 in the United States in 1984) contain a whole bunch of "communities" and names. The gencases.py program will read from these text files and write (to standard out) a sample output with n and k as specified on the command line. For example:

> python3 gencases.py 5 3
3
Basic Intelligence Marine
Air Support Control Officer
Aircraft Structures Mechanic A-6/EA-6
5
Rebeca 2 2 2
Kecia 1 4 1
Kereem 3 3 5
Reno 4 5 4
Lashauna 5 1 3

You can use this program to test your code. For example, here is a bash session showing me creating a sample input in in.txt, then testing all three programs, and finally using the diff utility to make sure the outputs all match.

> python3 gencases.py 100 10 > in.txt

> python3 selection.py < in.txt > out1.txt

> g++ selection.cpp -o selection
> ./selection < in.txt > out2.txt

> javac selection.java
> java selection < in.txt > out3.txt

> diff -q out1.txt out2.txt
> diff -q out1.txt out3.txt
> # Note: no output means they match!

Once you are convinced your program is correct, you will probably want to know how fast it is. The Linux time program is a great resource for this. Since we don't care about the output when doing timings, you could do the timing in one line. For example, to time the Python version of the algorithm with n=500 and k=15, you could do:

> python3 gencases.py 500 15 | /usr/bin/time -v python3 selection.py > /dev/null
        Command being timed: "python3 selection.py"
        User time (seconds): 4.11
        System time (seconds): 0.01
        Percent of CPU this job got: 98%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.20
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 27184
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 1855
        Voluntary context switches: 2
        Involuntary context switches: 205
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

This tells us that it took about 4.22 seconds and 27MB to complete the selection process.

5 Your Tasks

The first two tasks are written, and the last is electronic. See the instructions at the top of this assignment for how to submit each part.

Analyze my "simple service selection algorithm" provided in the starter code. The algorithm and running time is exactly identical between the three programming languages; I suggest you read over all three to help you understand what is going on. I want a big-O analysis of the worst-case cost, in terms of k and n. Try to make your analysis as tight as possible.
Describe an improved algorithm for the same problem in pseudocode, and give a worst-case big-O analysis, in terms of k and n. Try to give a tight analysis, and try to design your algorithm to be as efficient as possible. If necessary, you may assume that k is much smaller than n. So for example, O(kn) time is much better than O(n²) time.
Implement your improved algorithm, along with any other tweaks you can think of, so that your code is as fast as possible. I recommend that you code in either C++, Java, or Python, but you may use any programming language that you wish, as long as it runs in the Linux environment in the labs. In any case, your improved program should be in a single file called sel.EXT, where EXT is cpp, java, py, or another extension depending on your language. If you don't use one of these three languages, you must also submit a README.txt file with instructions on how to compile and run your program.

Bonus: The fastest (correct) program on a large sample input of my choosing will earn a tangible prize and a 1% bonus on this project for their section of the class. WARNING: C++ (with the -O6 flag to the g++ compiler) has the potential to be much faster than Java, for the same algorithm. Both languages will probably be at least 10 times faster than Python for the same algorithm.