SD 212 Spring 2024 / Homeworks


hw24: Friend words

  • Due before the beginning of class on Wednesday, March 27

The Data

Download this list of 10,000 common English words, or type the following in your terminal to get it directly:

wget "https://roche.work/212/hw/parallel/10k-words.txt"

(source)

Starter code

This program looks through that list of words and prints out all pairs of words that are “friends”, meaning that the two words are exactly the same except for their 2nd and 6th letters. For example, “mAssiVe” and “mIssiLe” are friends.

import re

# Read in words from file into a single list
words = [line.strip() for line in open('10k-words.txt')]
assert len(words) == 10000

def friends(word1, word2):
    """Determines if the two strings are the same except for index 1 and 5.
    For example, 'massive' and 'missile'.
    """
    if len(word1) < 6:
        return False
    pattern = f"^{word1[:1]}[^{word1[1]}]{word1[2:5]}[^{word1[5]}]{word1[6:]}$"
    return re.match(pattern, word2)

def check_words(start_index, end_index):
    """Finds friend words for all words in the given index range."""
    for i in range(start_index, end_index):
        for j in range(i+1, len(words)):
            if friends(words[i], words[j]):
                print(words[i], words[j])

if __name__ == '__main__':
    # There are 10000 words in the list, so this function call gets them all
    check_words(0, 10000)

    print("all friends found")

You should copy-paste this code into a new file friends.py, or run this command to download it directly:

wget "https://roche.work/212/hw/parallel/friends.py"

Your task

If you run friends.py as-is, it works correctly to find all friend words, but takes a little while, probably about 30 seconds to 1 minute.

Improve the code so that it uses Python multiprocessing and 5 parallel calls to check_words so that it gets all the same results (maybe in a different order), but runs faster.

Submit command

To submit files for this homework, run one of these commands:

submit -c=sd212 -p=hw24 friends.py
club -csd212 -phw24 friends.py