This is the archived website of SI 335 from the Spring 2014 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.
This project contains both written and electronic (programming) parts. The electronic parts (code) must be submitted according to the instructions on this page. The written parts must be typed or neatly written, and turned in in to the instructor along with a cover sheet. The cover sheet must contain your name, the name of the course SI 335, and the name of the project, Project 1.
The year is 1984 and the world is ruled by just a few super-states. One of them, Oceania, is dominant and controls both North and South America, sourthern Africa, and Australia. It is engaged in a perpetual, unwinnable war with the other super-powers for the purpose of subjugating its own citizens.
To support the perpetual war, the Oceania Military Academy (OMA) churns out an amazingly high number of graduates every year (in the millions), and the process of selecting which part of the military service they will each enter is a daunting task. This being Oceania, the students themselves (called Proles) have no say in their own service selection. Instead, they are all ranked by each of the different communities, who then take turns in picking which Proles they will get.
Your task is to design an algorithm and write a computer program that reads in the names of the k different communities, as well as the n Proles and their rankings in each community, and prints the results of the service selection. This works similarly to a draft: the k communities, in the order they were read in, take turns picking, one at a time, until there are no more Proles left. For each pick at each iteration, the community chooses the highest-ranking Prole (according to its own ranking) that has not been picked yet.
Specifically, the input, which should be read from standard in, contains in order
For example, consider the following sample input file:
3 Tanks Infantry Paratrooper 5 Bill 4 5 1 Suzy 1 1 4 John 2 4 5 Kate 3 2 3 Mike 5 3 2
The Tanks community will choose Suzy first. Then the Infantry would like to choose Suzy but can't, so they pick Kate instead. Next the Paratroopers pick Bill, and so on. The correct output would be
Suzy Tanks Kate Infantry Bill Paratrooper John Tanks Mike Infantry
The following programs are available to help you:
You can get all these files at once by downloading the file
proj1.tar.gz
and running tar xzvf proj1.tar.gz
I have written a simple algorithm to solve the service selection
problem and implemented it in Python, C++, and Java. Those are the
selection.X
files above. Start by reading over one or
all of these to see how my algorithm works. Note: this is a really
slow way to solve the problem! Your job will be to improve it, of
course.
The other three files are to help you generate test cases.
The text files jobs.txt
(Marine MOS codes from the Vietnam
era) and names.txt
(All first names with frequency at least
5 in the United States in 1984) contain a whole bunch of "communities"
and names. The gencases.py
program will read from these
text files and write (to standard out) a sample output with n
and k as specified on the command line. For example:
> python3 gencases.py 5 3 3 Basic Intelligence Marine Air Support Control Officer Aircraft Structures Mechanic A-6/EA-6 5 Rebeca 2 2 2 Kecia 1 4 1 Kereem 3 3 5 Reno 4 5 4 Lashauna 5 1 3
You can use this program to test your code. For example, here
is a bash session showing me creating a sample input in
in.txt
, then testing all three programs, and finally using
the diff
utility to make sure the outputs all match.
> python3 gencases.py 100 10 > in.txt > python3 selection.py < in.txt > out1.txt > g++ selection.cpp -o selection > ./selection < in.txt > out2.txt > javac selection.java > java selection < in.txt > out3.txt > diff -q out1.txt out2.txt > diff -q out1.txt out3.txt > # Note: no output means they match!
Once you are convinced your program is correct, you will probably
want to know how fast it is. The Linux time
program is a
great resource for this. Since we don't care about the output when
doing timings, you could do the timing in one line. For example, to time
the Python version of the algorithm with n=500 and k=15, you
could do:
> python3 gencases.py 500 15 | /usr/bin/time -v python3 selection.py > /dev/null Command being timed: "python3 selection.py" User time (seconds): 4.11 System time (seconds): 0.01 Percent of CPU this job got: 98% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.20 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 27184 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 1855 Voluntary context switches: 2 Involuntary context switches: 205 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
This tells us that it took about 4.22 seconds and 27MB to complete the selection process.
The first two tasks are written, and the last is electronic. See the instructions at the top of this assignment for how to submit each part.
Analyze my "simple service selection algorithm" provided in the starter code. The algorithm and running time is exactly identical between the three programming languages; I suggest you read over all three to help you understand what is going on. I want a big-O analysis of the worst-case cost, in terms of k and n. Try to make your analysis as tight as possible.
Describe an improved algorithm for the same problem in pseudocode, and give a worst-case big-O analysis, in terms of k and n. Try to give a tight analysis, and try to design your algorithm to be as efficient as possible. If necessary, you may assume that k is much smaller than n. So for example, O(kn) time is much better than O(n2) time.
Implement your improved algorithm, along with any other tweaks
you can think of, so that your code is as fast as possible.
I recommend that you code in either C++, Java, or Python, but you
may use any
programming language that you wish, as long as it runs in the Linux
environment in the labs. In any case, your improved program should be
in a single file called sel.EXT
, where EXT is
cpp
, java
, py
,
or another extension depending on your language. If you
don't use one of these three languages,
you must also submit a README.txt
file with
instructions on how to compile and run your program.
Bonus: The fastest (correct) program on a large
sample input of my choosing will earn a tangible prize and a 1% bonus
on this project for their section of the class.
WARNING: C++ (with the -O6
flag to the g++
compiler) has the potential to be much faster than Java, for the same
algorithm. Both languages will probably be at least 10 times faster than
Python for the same algorithm.