Problem 76

This is the archived version of this course from the Spring 2013 semester. You might find something more recent by visitning my teaching page.

Max bucket size in a hash table

Due: April 12
Points: 5

In class, we were deriving an asymptotic bound for the expected size of the largest bucket in a simple (single hash function) hash table. Here's what we said:

  1. We want to determine the probability that all of the buckets have fewer than \(k\) elements in them. Call this probability \(p\):
  2. \(p = \mathrm{Pr}[\text{every bucket has fewer than k elements}]\)
  3. Given some subset of \(k\) elements, the probability that all of those \(k\) have the same hash value is \[q = \left(\frac{1}{s}\right)^{k-1} = \frac{1}{s^{k-1}}\]
  4. Therefore the probability that \(k\) elements do not all have the same hash value is \[1-q = 1-\frac{1}{s^{k-1}}\]
  5. There are \(\binom{n}{k}\) subsets of \(k\) inserted elements.
  6. Every bucket has fewer than \(k\) elements if every subset of \(k\) elements don't all have the same hash value. From steps (4) and (5), this is \[p = (1-q)^{\binom{n}{k}} = \left(1-\frac{1}{s^{k-1}}\right)^{\binom{n}{k}}\]
  7. We can approximate \(\binom{n}{k}\) as follows: \[\binom{n}{k} = \frac{n(n-1)(n-2)\cdots(n-k+1)}{k(k-1)(k-2)\cdots 1} \approx \left(\frac{n}{k}\right)^k\]
  8. For any big numbers \(a\) and \(b\), we can approximate \((1-1/a)^b\) as \[\left(1 - \frac{1}{a}\right)^b = \left(\left(1 - \frac{1}{a}\right)^a\right)^{b/a} \approx \left(\frac{1}{\mathbb{e}}\right)^{b/a} = \mathbb{e}^{-b/a}\] where \(\mathbb{e}\) is the constant you all know and love, \(\mathbb{e}\approx 2.71828\)

Now use these steps to finish off the proof that we should expect buckets to be \(O((\log n)/(\log\log n))\) in size, when \(s=2n\).

To accomplish this, first plug in \(2n\) for \(s\) in the formula from (6), and set \(p=1/2\). Next, use (7) and (8) to simplify the exponential in terms of \(\mathbb{e}\) and without any combinations or factorials.

After you take the log of both sides, you should get a formula for \(\lg n\) in terms of \(k\), showing that \(\lg n \in \Theta(k\log k)\). Now you just need to reverse this to show the result we want, that \(k \in \Theta((\log n)/(\log\log n))\).