Problem 68

This is the archived version of this course from the Spring 2013 semester. You might find something more recent by visitning my teaching page.

Another hash table simulation

Due: April 5
Points: 7

Write a Java, C++, or Python program that demonstrates how big the buckets might get in a hash table that has multiple perfect hash functions.

You are going to modify your program from Problem 64 so that it takes a third command-line argument: \(k\). This integer is the number of distinct hash functions that your hash table simulation will use.

Here's how it works: you start with an empty table of size \(s\). Then you "insert" \(n\) items into this table. For each "item", you compute \(k\) hash functions, which just means picking \(k\) random integers between 0 and \(s-1\). Then you choose the location out of those \(k\) that has the smallest count and increase that count by one. That is, you "insert" the new item into the least-filled of the \(k\) computed positions of the hash table.

(You might want to look at this page for some background on this hashing idea.)

At the end, print out a single line for each entry in your table, along with a number of X's equal to the size. On the last line, print out what is the maximum bucket size. This is the same as Problem 64.

Finally, do an empirical study of the effect \(k\) has on the maximum bucket size when \(s=2n\). What you should do is choose a large value for \(n\), like 1,000,000 or so. Then set the hash table size to \(s=2n\). Keep \(s\) and \(n\) the same over a series of experiments, starting with \(k=1\) and incresing to \(k=10\). Make a graph plotting the maximum bucket size as a function of \(k\), and connect the dots to show the curve.

Then make an educated wager: What do you think the best value for \(k\) would be?

Submit your program according to the instructions on the submit page. Turn in a write-up of your empirical study on paper.