This is the archived website of SY 301 from the Fall 2016 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

Unit 12: Secure Communication

1 Talking to strangers

Suppose two people, Alice and Bob, want to communicate with each other over an insecure channel Examples of an insecure channel might be writing on a public message board, or sending email, or Starbucks wireless, or really any kind of Internet communication that's not HTTPS. Alice and Bob want to be sure that a third party listening to their transmissions, Eve (the evil eavesdropper), won't be able to understand what they are saying.

Beginning of Class 37

This can be addressed with a symmetric encryption scheme like our Enigma machine (though in modern times we would use something more secure such as AES). If Alice and Bob both encrypt their messages to each other, and Eve does not know the key they are using, then she will not be able to learn anything from her eavesdropping. We call encryption like the Enigma machine "symmetric" because both parties, Alice and Bob, use the same key to encrypt and decrypt messages.

The problem in this scenario is, what key should Alice and Bob use? If they have never met before and can only communicate over the insecure channel, how can they be sure that they both are using the same key and that no one else knows what it is? We call this the key exchange problem, and until 1974 it was thought to be impossible to solve. Every cryptographic method for secure communication relied on pre-establishing a shared key via a secure channel (in person, through certified mail, etc) before two parties could communicate over an insecure channel.

In fact, the idea was so radical that at the time the cryptographer Ralph Merkle submitted a project proposal for his masters class which contained the basics of a secure key exchange protocol and it was rejected by his professor, who said he should work on something more reasonable. Key exchange is so important that, without it, the Internet would not exist as we know it today.

If you go to any HTTPS website and click the link for "more details" on the security, you'll see something like the image above. Notice carefully the text under "Secure Connection":

The connection to this site is encrypted and authenticated using a strong protocol (TLS 1.2), a strong key exchange (ECDHE_RSA), and a strong cipher (AES_128_GCM).

The TLS protocol is something you'll learn about in your networking class, and the AES cipher is a symmetric encryption scheme similar in concept (but much, much more secure) than the Enigma cipher. What's in between? A strong key exchange. Two of the components of the acronym ECDHE_RSA are the two cryptographic tools we're going to learn in this unit, Diffie-Hellman and RSA public-key encryption.

2 Group theory

Crucial to our key exchange protocol will be group theory. A group is composed of two things: a set of elements and an operation which maps between them. For instance, we can take the integers as a set of elements and addition as an operation to form a group. This is commonly written as \((\mathbb{Z}, +)\).

Group axioms

However, not all elements and all operations will make a proper group. For the set of elements \(G\) with the operation \(\cdot\) to be a group, the following properties, called group axioms, must hold (this also gives us a chance to review our mathematical notation):

  • Closure: Applying the operation to two group elements always gives you another group element.
    (\(\forall x,y \in G,\ x\cdot y\in G\))
  • Associativity: When you have a bunch of group operations to do in a row, which one you do first doesn't matter.
    (\(\forall x,y,z\in G, (x\cdot y)\cdot z = x\cdot(y\cdot z)\))
  • Identity element: One of the group elements has no effect on whatever you operate on it with.
    (\(\exists i \in G \mid \forall x\in G,x\cdot i=x\))
  • Inverse element: Every group element can be "canceled out" by another group element to give you the identity.
    (\(\forall x \in G, \exists y \in G \mid x\cdot y=i\))

Lets think about the integers and addition again. If you add any two integers, you will get another integer and never, say, a decimal number, so closure is straightforward. We also know that addition is associative so that is okay as well. The identity element in addition is zero, since adding zero doesn't change the value of an integer. Finally, every element \(x\) has an inverse integer \(-x\) such that \(x + -x = 0\).

The idea of a group fits nicely with integers, but it is actually a more general mathematical concept. All the possible sequences of rotations of a Rubik's cube also form a group, where the "group operation" in this case is doing one sequence of rotations, followed by the other sequence. For every rotation, there is a reverse rotation that will put the cube back in its original state. If you consider the "null rotation" where you do nothing, then it even has an identity element. Looking at Rubik's cubes in terms of group theory leads to some interesting observations on the complexity of solving them, and when you team this mathematical machinery with powerful computers, you can discover some surprising results.

Multiplicative groups

Integers with the addition operator are easy to understand but relatively uninteresting from a cryptographic perspective. So what if the operation is multiplication? The integers with multiplication, \((\mathbb{Z},\times)\) is good in terms of closure and associativity, and 1 works as the identity element. But there's a problem with inverses: take an integer like 4. We know that its multiplicative inverse is the fraction \(\tfrac{1}{4}\), also known as \(0.25\). But this inverse is not an integer, so that's a violation of the axiom. \((\mathbb{Z},\times)\) does not make a group.

We can fix this problem by making the set bigger. For example, if we consider the set of real numbers (i.e., decimals with possibly infinite expansions), then \((\mathbb{R}, \times)\) almost forms a group. The only remaining problem is zero, which obviously doesn't have a multiplicative inverse because nothing can be multiplied by 0 to get the identity 1. So we remove 0 from the set, writing \(\mathbb{R}\setminus\{0\}\), or more commonly, \(\mathbb{R}^*\). \((\mathbb{R}^*, \times)\) is a multiplicative group.

Group exponentiation

Remember that groups have only one operation defined, so in a multiplicative group where the operation is \(\cdot\), there is for example no such thing as subtraction like \(x-y\) where \(x,y\in G\). no such thing as addition or subtraction for example.

What about exponentiation, like \(x^4\)? Although this operation isn't part of the group axioms, we know that raising something to an integer power really means multiplying it by itself that many times. So exponentiation is a thing in a multiplicative group, as long as the exponent is a nonnegative integer.

3 Diffie-Hellman Protocol

Now we're ready to define the Diffie-Hellman key exchange protocol. It's a multi-step operation between Alice and Bob. Again, remember the goal is that Alice and Bob come up with a shared secret key, such that anyone who intercepted all of the traffic between Alice and Bob still wouldn't know their shared key.

Diffie-Hellman Key Exchange Protocol
  1. Alice chooses a group \((G,\cdot)\) and a group element \(x \in G\).
  2. Alice chooses a random secret exponent \(a\) and computes \(x^a\) in the group.
  3. Alice sends the group \((G,\cdot)\), the base element element \(x\in G\), and the computed power \(x^a\in G\) to Bob.
  4. Bob chooses a random secret exponent \(b\) and computes \(x^b\) as well as \((x^a)^b\) in the group, using the values Alice sent him.
  5. Bob sends \(x^b\in G\) back to Alice.
  6. Alice computes \((x^b)^a\) in the group, using the value Bob sent to her.
  7. At this point, Alice and Bob both have the value of group element \[k = (x^a)^b = (x^b)^a = x^{ab},\] which is their shared secret key.

See how it works? The main idea is that Bob can compute \(k=x^{ab}\) because he knows \(x^a\) and \(b\), and similarly Alice can compute \(k=x^{ab}\) because she knows \(x^b\) and \(a\), but Eve only sees the values of \(x\), \(x^a\), and \(x^b\), so she's stuck!

Now there's just one question left to answer: what group should we use? It turns out this is a really important question that makes the whole thing work! If we choose a bad group in one way, then computing the group exponentiations that our friends Alice and Bob have to do might take forever, which would make this protocol useless. On the other hand, if we choose a bad group in a different way, then it might be possible for Eve to discover the secret key by figuring out \(a\) or \(b\).

DH with real numbers?

Let's try the one multiplicative group we know so far, which is the nonzero real numbers, \((\mathbb{R}^*,\times)\). Unfortunately there are many problems with this choice of group:

  • Computers can't handle floating-point numbers exactly. There is always some round-off error, and this error can get big when you do exponentiation. This is usually not too bad a problem when doing scientific computing since your measurements might not be accurate to 64 bits anyway, but in cryptography we have to be exact! Alice and Bob have to get exactly the same key \(k\), with exactly the same bits, or else they won't be able to decode each other's messages using that key.
  • It's too easy for Eve to crack this protocol. Since Eve sees the values of \(x\) and \(x^a\), to discover Alice's secret exponent \(a\) Eve just needs to compute \(\log_x(a)\), which can be done very quickly, basically in linear-time. Once Eve knows \(a\), she can compute \(k = (x^b)^a\) and the shared secret is no longer so secret.
  • If Alice chooses \(x\) and \(a\) large enough to make Eve's logarithm computation hard, then the sheer size of \(x^a\) will be terabytes long, way too big to send to Bob, let alone for him to compute another exponentiation with.

In summary, using real numbers is useless, slow, and insecure. The basic protocol is fine, but we need a group with exact values, where the group exponentiation is going to be fast, but doing logarithms in the group will be really slow. Back to group theory...

4 Modulo groups

The answer is to use modular integers with multiplication as the group. You choose a modulus \(n\), and instead of the group operation being just plain multiplication, it's doing multiplication and then taking the product modulo \(n\). The range of integers modulo \(n\) is \(\{0,1,2,\ldots,n-1\}\), and this is conveniently denoted by \([0,n-1]\) or just \(\mathbb{Z}_n\).

The first three group axioms (closure, associativity, and identity) work just like with plain integers, and 1 is still the identity element.

Modular inverses

Remember the problem with using \((\mathbb{Z},\times)\) as a group was that inverses are not integers. Let's see how using modular multiplication solves this problem. Consider the set \(\mathbb{Z}_5\), composed of:


\(\mathbb{Z}_5 = \{0, 1, 2, 3, 4\}\).


If our function is multiplication mod 5 (that is, \(x\cdot y=(x*y)\,\%\, 5\)), what is the inverse of 3? Well, our identity here is 1, so remember we need a number \(x\) such that \(3 * x = 1\). In normal integers this is impossible, but in a modular group \(3 * 2 = 6\) and \(6 \bmod 5 = 1\). So \(3\) and \(2\) are the inverses of each other. What about \(4\)? It is actually the inverse of itself! \(4 * 4 = 16 \bmod 5 = 1\). \(1\) is also its own inverse.

But zero is still a problem here, just like with real numbers. Zero is a very special number that in this case ruins everything. Anything times zero is zero, so it cannot have an inverse. However, we can deal with this by just removing zero from our set. We can write this as \(\mathbb{Z} \setminus \{0\}\) or more commonly \(\mathbb{Z}^*_n\), showing that the operation is multiplication which implies that zero cannot be in the group. So, \(\mathbb{Z}_5\) with modular multiplication is not a group, but \(\mathbb{Z}_5^*\) is.

Let's consider another group and find its inverses:


\(\mathbb{Z}^*_7 = \{1, 2, 3, 4, 5, 6\}\)

  • \(\mathbf{2} * 4 = 8 \bmod 7 = 1\)
  • \(\mathbf{3} * 5 = 15 \bmod 7 = 1\)
  • \(\mathbf{4} * 2 = 8 \bmod 7 = 1\)
  • \(\mathbf{5} * 3 = 15 \bmod 7 = 1\)
  • \(\mathbf{6} * 6 = 36 \bmod 7 = 1\)

For small numbers we can just guess or intuit the inverses but for large numbers there is an efficient algorithm called the extended Euclidean algorithm which can find the inverse of any number. We don't have time to go into the details of how this works, but here's a short Python function to compute modular inverses so you can see it's quite simple. The running time turns out to be \(O(\log n)\) steps. Try it for yourself!

1
2
3
4
5
6
7
8
9
10
# Computes the inverse of x modulo n
def invmod(x, n):
    a, b, y, z = n, x, 0, 1
    while b:
        a, (q,b) = b, divmod(a, b)
        y, z = z, y - q*z
    if a != 1:
        raise ZeroDivisionError("modular inverse doesn't exist")
    if y > 0: return y
    else: return n+y

Beginning of Class 38

Fast modular exponentiation

How do we actually compute exponentiation when the numbers we are using might be very large? For small numbers, we can multiply one at a time, like to compute \(5^4\) in \(\mathbb{Z}_{17}\), just do: \[5^4 \bmod 17 = (5*5*5*5) \bmod 17 = 625 \bmod 17 = 13.\] You could keep the numbers even a little smaller by taking the mod at each intermediate step, like: \[5^4 \bmod 17 = (25*5*5) \bmod 17 = (8*5*5)\bmod 17 = (40*5)\bmod 17 = (6*5)\bmod 17 = 30\bmod 17 = 13.\]

But what about when the numbers get really big? Suppose that \(n\) is 1000 bits long, which is a relatively low size for Diffie-Hellman, and the exponents \(a\) and \(b\) will be just as big. To compute one modular exponentiation with numbers this size using the method above, would take something like \(2^{1000}\) steps, which is a whole lot.

Aside: How many steps is 21000?

If a super-powerful alien race used every atom in the known universe as a computer running one step per quantum moment since the Big Bang, that would be roughly \(2^{443}\) steps.

So even this super-powerful alien race that has turned every atom in the universe into computers more powerful than we can imagine, would not finish the computation before the human race, the Earth, and indeed our entire galaxy are long, long gone. It's safe to say, any computation of this magnitude is truly impossible.

Fortunately there is a much better way. Think about how we expanded our exponentation before:

\(5^4 = 5 * 5 * 5 * 5\)

We can also write it like this:

\(5^4 = 5^2 * 5^2 = (5^2)^2\)

Arranging it that way, we go from three multiplications to two. We can go even further if the exponent is larger. Instead of calculating \(5^8\) we can do \(((5^2)^2)^2\). Instead of calculating \(5^{16}\) we can do \((((5^2)^2)^2)^2\). You can probably see by now that we are going from \(O(a)\) multiplications for an exponent of \(a\) to \(O(\log a)\) multiplications. This optimization is called repeated squaring.

If we want to calculate an exponent that isn't a power of two, we can still do it with much fewer multiplications. Every number can be written as a sum of powers of 2, so we can do the repeated squaring for all the relevant powers of two and multiply together the ones we need.

  • \(2^{12} = 2^8 * 2^4\)
  • \(2^{15} = 2^8 * 2^4 * 2^2 * 2^1\)
  • \(2^{24} = 2^{16} * 2^8\)
  • \(2^{345} = 2^{256} * 2^{64} * 2^{16} * 2^8 * 2^1\)

Non-prime moduli

There is one remaining hiccup that we have so far avoided by accident. If the modulus number \(n\) is not a prime number then some of the integers between \(1\) and \(n-1\) will actually not be part of the group. Consider the candidate group \(\mathbb{Z}^*_{15}\):

\(\mathbb{Z}^*_{15} = \{1, 2, 3, 4, \ldots, 14\}\)

  • \(\mathbf{1} * 1 = 1 \bmod 15 = 1\)
  • \(\mathbf{2} * 8 = 16 \bmod 15 = 1\)
  • \(\mathbf{3} * ? = 1\)
  • \(\mathbf{4} * 4 = 16 \bmod 15 = 1\)
  • \(\mathbf{5} * ? = 1\)
  • \(\mathbf{6} * ? = 1\)
  • \(\mathbf{7} * 13 = 91 \bmod 15 = 1\)
  • \(\mathbf{8} * 2 = 16 \bmod 15 = 1\)
  • \(\mathbf{9} * ? = 1\)
  • \(\mathbf{10} * ? = 1\)
  • \(\mathbf{11} * 11 = 121 \bmod 15 = 1\)
  • \(\mathbf{12} * ? = 1\)
  • \(\mathbf{13} * 7 = 91 \bmod 15 = 1\)
  • \(\mathbf{14} * 14 = 196 \bmod 15 = 1\)

Try as you might, there are no inverses for 3, 5, 6, 9, 10, or 12. Think about 3: for it to have an inverse \(x\), you need \(3x\) to be one higher than a multiple of 15. But since 15 itself is a multiple of 3, every multiple of 15 is also a multiple of 3, and so one higher than a multiple of 15 will never be a multiple of 3. It's impossible!

Similarly for 5, 6, 9, 10, and 12, what they all have in common is that they share some multiple with 15. Since 15 is 3 times 5, any multiple of 3 or 5 will not have an inverse modulo 15.

More precisely, the group \(\mathbb{Z}^*_n\) contains all the elements from 1 to \(n-1\) which are coprime with \(n\). That means that they share no common factors with \(n\). The number of coprime elements is called the totient of \(n\), written as \(\varphi(n)\).

The very best case is where \(n\) is prime, because then everything from 1 up to \(n-1\) is in the group; in other words, \(\varphi(n)=n-1\).

The second-best case is when \(n\) is the product of two distinct prime numbers \(n=pq\), in which case we say \(n\) is a semiprime and the totient is \(\varphi(n) = (p-1)(q-1)\). For example, with \(n=15\), we have \(\varphi(15)=(3-1)(5-1) = 8\). You can confirm from the table above that indeed there are 8 elements in \(\mathbb{Z}_{15}^*\): \(\{1,2,4,7,8,11,13,14\}\).

5 Diffie-Hellman for real

Now that we know everything we need to know about groups, we can finally get to our actual key exchange protocol. The Diffie Hellman protocol starts with each party agreeing on some public parameters. Everything in DH is calculated in a group \(\mathbb{Z}_p\) where \(n\) is a prime number that both Alice and Bob know. In addition to \(n\), they also agree on \(x\) which is an arbitrary integer in \(\mathbb{Z}_p\), usually chosen as 2 for simplicity. These parameters can be chosen by one party and sent to the other or picked from a set of global parameters known to everyone (published by NIST for instance).

The exchange protocol is the same one described above. Once the public parameters are decided, it's all very simple:


  1. Alice chooses a random integer \(a\) in \(\mathbb{Z}_p\) and sends \(x^a \bmod n\) to Bob.
  2. Bob chooses a random integer \(b\) in \(\mathbb{Z}_p\) and sends \(x^b \bmod n\) to Alice.
  3. Alice calculates \(k = (x^b)^a \bmod n = \mathbf{x^{ab}} \bmod n\).
  4. Bob calculates \(k = (x^a)^b \bmod n = \mathbf{x^{ab}} \bmod n\).

(Warning: this image uses different variable names even though the idea is the same!)

(Warning: this image uses different variable names even though the idea is the same!)


As you can see, the protocol is symmetric. Bob and Alice both do the same thing, generate a random private integer and send it to the other person as the exponent of \(x\). We can see that they both get the same key \(k\) because with exponentiation the order of exponents doesn't matter. Alice starts with \(x^b\) and adds in \(a\) while Bob starts with \(x^a\) and adds in \(b\).

Deriving the key

At the end of the day both Alice and Bob have the same value \(k \in \mathbb{Z}_p\). \(n\) is a random prime number so having a number between 1 and \(n\) doesn't usually give us what we want in terms of a shared key. Most symmetric encryption algorithms, like AES, require a random key in the range \((0, 2^b)\) where \(b\) is the particular key size of the cipher. In order to get a key in that range, the output of the Diffie-Hellman protocol is usually run through a cryptographic hash function and truncated if necessary to match the right key size for the cipher.

Security

So we know that Alice and Bob both get the same key, but how do we know that someone listening in on the conversation can't also get \(k\)? That would defeat the entire purpose of a secure key exchange protocol. We are depending on the fact that \(x^b\) doesn't give any information about \(b\). If anyone knows both \(b\) and \(a\), it is easy to compute \(x^{ab}\). Determining \(b\) from \(x^b\) is called the discrete log problem, and it is believed to be very difficult to solve. The best known algorithm is called the general number field sieve and it runs in time approximately \(2^{\lambda / 3}\), where \(\lambda\) is the length of \(n\) in bits. This is exponential, which we know is very slow.

The security of Diffie-Hellman relies on the fact that raising \(x\) to the power \(b\) is like a one-way function on \(b\). No one can get \(b\) back from \(x^b\), but because of the way exponentiation works we can still use \(x^b\) to calculate \(x^{ab}\) if we know \(a\). However, it is very crucial that one cannot calculate \(x^{ab}\) from \(x^b\) and \(x^a\). There is no way algebraically to combine \(x^b\) and \(x^a\) to get anything but \(x^{b+a}\), which is not useful. An easy way to do this, however, would be to calculate the discrete log of \(x^b\) or \(x^a\). Since that problem is hard, we believe Diffie-Hellman to be secure.

Parameter size: For security, \(n\) should be large enough that computing discrete logs in \(\mathbb{Z}_p\) is difficult. With modern algorithms and computers, a size of \(\lambda = 2048\) seems to be sufficient for security against any possible adversary for many years to come.

Perfect forward secrecy

One of the coolest things about Diffie-Hellman is that it has something called perfect forward secrecy. This means that, if Alice and Bob destroy the key \(k\) and their individual shares \(b\) and \(a\) after they are done communicating, they will have security against anyone recovering \(k\) in the future, even if they recorded the entire conversation, key agreement included. Since neither Alice or Bob remember \(b\) or \(a\), the only information available in a recorded transcript of their communication is \(x^b\) and \(x^a\). We already showed that these alone cannot help in recovering \(k\). Since \(k\) itself is only used during the conversation and not kept long-term by the parties involved, they cannot even reveal it to an adversary later if they wanted to! This means that their conversation can never be decrypted, even if one of them is coerced later into giving up their secret keys. A key that is used for a short time like this and then discarded is called an ephemeral key.

Beginning of Class 39

6 Public-Key Encryption

You learned about the idea of public-key encryption (also known as asymmetric encryption back in SY110. The general idea is that instead of Alice and Bob sharing the same key that they both use to encrypt and decrypt messages to each other, Alice will have two keys with different uses.

Alice's public key is published freely for anyone to see. The corresponding private key is known only to Alice, and she keeps it a secret.

This can be used for two purposes, encryption and signing, as follows:

  • If Bob wants to encrypt a message to Alice, he encrypts it using her public key. Then only Alice will be able to decrypt it using the corresponding private key.
  • If Alice wants to sign a message so people will know it really came from her, she encrypts it using her private key. Then anyone else (Bob for example) can try decrypting using Alice's public key. If it works, then he knows the message really came from Alice.

RSA Encryption

We're now ready to explain the idea of one public-key cryptosystem, which is RSA encryption (named after the three authors Rivest, Shamir, and Adelman).

Again, it's based on group theory, but with an important additional property to before: the group must be cyclic. Among other things, this means that there is some exponent \(e\) such that, for any element \(x\in G\), \(x^e = i\) using the group operation for group exponentiation, where \(i\) is the multiplicative group identity element. Notice this means that \(x^{e+1} = x\) for any \(x\in G\). In other words, there's some power that if you raise any group element to that power, you get back to the original group element itself.

Using that idea, the RSA cryptosystem works very similar to Diffie-Hellman, even though it's doing something quite different. It's split into three parts: key generation (which Alice does once to get her public/private key pair), public encryption (which Bob can do to send a message to Alice or check a signature), and private decryption (with Alice can do to decrypt a message or compute a signature).

RSA Cryptosystem
  • Key generation: Alice chooses the cyclic group \(G, \cdot\) and two positive integers \(a, b\) such that, for any \(x\in G\), \(x^{ab} = x\).
    (Using the definition of \(e\) in a cyclic group above, Alice has to pick \(a,b\) so that \(ab \bmod e = 1\).)
    Alice's public key consists of the group \(G, \cdot\) and the public exponent \(b\).
    Alice's private key consists of just the secret exponent \(a\).
  • Public encryption: To encrypt a message for Alice, Bob first converts it to a group element \(x \in G\). Then he computes \(x^b\) and sends that to Alice.
  • Private decryption: To decrypt the message \(x^b\) that was sent, Alice raises it to her private exponent \(a\), which gives \[(x^b)^a = x^{ab} = x,\] where the last equality comes from the way the exponents \(a,b\) were chosen originally by Alice.
    That is, Alice raises the ciphertext \(x^b\) to her private exponent in order to recover the original message \(x\).
  • Signing: If Alice wants to sign a message \(x\), she computes \(x^a\) and sends that encryption, plus the message \(x\) itself, to Bob. Then Bob can check the decryption of \(x^a\) using the public exponent \(b\) and make sure he gets back the same original message \(x\). If so, then he knows the message really came from Alice who knows the secret exponent \(a\).

Based on the key generation property that \(x^{ab} = x\) for any \(x\in G\), you can see that this definitely works correctly in all cases. The question is, again, what group do we choose? We first need to have a way of getting this cyclic exponent \(e\) so that Alice can generate the public/private exponents \(a,b\) during key generation such that \(ab \bmod e = 1\). But we also need security, so that anyone who knows the group and who knows the public exponent \(b\) won't be able to figure out \(a\).

Modular groups again

The group that works again is integers modulo \(n\) for some carefully-chosen modulus \(n\). As it turns out, this group \(\mathbb{Z}_n^*\) is always cyclic, and the cyclic exponent \(e\) is always equal to the number of elements in the group itself.

And remember from before that the number of elements in \(\mathbb{Z}_n^*\) is all the integers less than \(n\) that don't have any shared factors with \(n\). This value is so important that it has it's own name: the totient of \(n\), usually written \(\varphi(n)\). The relationship to cyclic groups has been known since the 17th century:

Fermat's Little Theorem
For any positive integer \(n\), define \(\varphi(n)\) to be the number of integers in \(\{1,2,\ldots,n-1\}\) that do not have any common factors with \(n\).
Then, for any \(x\in \mathbb{Z}_n^*\), \(x^{\varphi(n)} \bmod n = 1\).

This means we can do RSA encryption in any modular group, as long as we can compute \(\varphi(n)\) for the modulus. Alice's key generation algorithm looks like this:

  1. Choose the modulus \(n\)
  2. Compute \(\varphi(n)\)
  3. Pick any public exponent \(b\) that doesn't have any common factors with \(\varphi(n)\)
  4. Compute the private exponent \(a\) by finding the modular inverse of \(b\) mod \(\varphi(n)\)
  5. Publish \(n,b\) as the public key and keep \(a\) secret

We already know from before that computing the private exponent \(a\) will be easy once you know \(\varphi(n)\) and \(b\). So now the challenge is: how to pick a modulus \(n\) so that Alice can figure out \(\varphi(n)\) but no one else can?

The first idea is to make the \(n\) prime like we did with Diffie-Hellman. In this case, \(\varphi(n) = n-1\) whenever \(n\) is prime, so Alice would be able to generate the keys no problem. But this would be horrible for security, because anyone else who knows \(n\) (it's part of the public key!) would also be able to compute \(\varphi(n)\) and figure out the private exponent \(a\).

So instead of a prime modulus, we use what's called a semi-prime modulus \(n=pq\) that is the product of two large prime numbers \(p\) and \(q\). In this case, \(\varphi(n) = (p-1)(q-1)\). So anyone that knows the factorization \(n=pq\) can figure out \(\varphi(n)\), but if you just know the modulus \(n\), you would need to factor it first to compute \(\varphi(n)\). Alice will therefore choose \(p\) and \(q\) as random prime numbers during key generation, but only publish the product \(n=pq\) as part of the public key.

RSA Key Generation
  1. Pick two large random prime numbers \(p, q\)
  2. Set \(n = pq\) and also compute \(\varphi(n) = (p-1)(q-1)\)
  3. Pick any public exponent \(b\) that doesn't have any common factors with \(\varphi(n)\)
  4. Compute the private exponent \(a\) by finding the modular inverse of \(b\) mod \(\varphi(n)\)
  5. Publish \(n,b\) as the public key and keep \(a\) secret

Security

With RSA, the Evil Eavesdropper Eve gets to see Alice's public key, which consists of the modulus \(n\) as well as the public exponent \(b\). The question is, how hard would it be for Eve to figure out Alice's private exponent \(a\), given this information?

Remember how Alice computes \(a\) herself: since Alice knows the primes \(p,q\) that were multiplied to get \(n\), Alice knows that \(\varphi(n) = (p-1)(q-1)\), and then computes \(a\) as the modular inverse of \(b\) modulo \(\varphi(n)\).

As it turns out, this is essentially the only way to get \(b\) — you have to first know the prime factorization of the modulus \(n\) as \(n=pq\). So we can summarize the hardness of cracking RSA as solving a math problem:

Integer Factorization
Given an integer \(n\), find a non-trivial factor \(p\) such that \(p\) divides \(n\) and \(2 \le p \le n-1\).

When \(n\) is the product of two large \(m\)-bit primes, there are no fast algorithms to compute the factors. More precisely, no one has come up with any way of finding the factors whose running time is \(O(m^k)\) for any constant exponent \(k\). That means that factoring is slower than \(O(m)\) time, slower than \(O(m^2)\) time, even slower than \(O(m^{1000})\) time!

Something really interesting happened here, which is the whole basis for why Alice can (relatively) quickly her public and private key with a few thousand bits, while it would take lifetimes using the world's most powerful computer to crack such a key: It's fast to multiply two integers, but much, much slower to "undo" this multiplication and factor the result.

Cryptographers call this kind of thing a trapdoor function because it's easy to go one way but hard to go back. Integer multiplication/factorization is the trapdoor used for RSA cryptography, but there are also many other cryptographic trapdoors that are used for other cryptographic protocols.

Another way to do key exchange

It turns out that public key encryption can do key exchange quite easily. If Alice has a keypair \((pub, priv)\), she can start by sending her public key \(pub\) to Bob. Bob then generates a key \(k\) that he and Alice will share at the end of the exchange. If we use \(\mathcal{E}_x(m)\) to mean encryption of message \(m\) with key \(x\), then Bob sends back to Alice \(\mathcal{E}_{pub}(k)\). Since it is encrypted with Alice's public key, she can decrypt it at the other end with her private key and now both Bob and Alice know \(k\). Eve will not learn \(k\) because she doesn't have Alice's private key.

So public key encryption is somehow a more powerful idea than key exchange: If you can do public-key crypto, you can also do key exchange using the same protocol. This is why it makes sense that RSA is somewhat more complicated than Diffie-Hellman. Even though we don't technically need Diffie-Hellman once we know about RSA, Diffie-Hellman is still useful because it is much simpler (and therefore faster!) than generating RSA public/private key pairs.