hw24: Friend words
- Due before the beginning of class on Wednesday, March 27
The Data
Download this list of 10,000 common English words, or type the following in your terminal to get it directly:
wget "https://roche.work/212/hw/parallel/10k-words.txt"
(source)
Starter code
This program looks through that list of words and prints out all pairs
of words that are “friends”, meaning that the two words are exactly the
same except for their 2nd and 6th letters. For example,
“mAssiVe
” and
“mIssiLe
” are friends.
import re
# Read in words from file into a single list
words = [line.strip() for line in open('10k-words.txt')]
assert len(words) == 10000
def friends(word1, word2):
"""Determines if the two strings are the same except for index 1 and 5.
For example, 'massive' and 'missile'.
"""
if len(word1) < 6:
return False
pattern = f"^{word1[:1]}[^{word1[1]}]{word1[2:5]}[^{word1[5]}]{word1[6:]}$"
return re.match(pattern, word2)
def check_words(start_index, end_index):
"""Finds friend words for all words in the given index range."""
for i in range(start_index, end_index):
for j in range(i+1, len(words)):
if friends(words[i], words[j]):
print(words[i], words[j])
if __name__ == '__main__':
# There are 10000 words in the list, so this function call gets them all
check_words(0, 10000)
print("all friends found")
You should copy-paste this code into a new file friends.py
, or run
this command to download it directly:
wget "https://roche.work/212/hw/parallel/friends.py"
Your task
If you run friends.py
as-is, it works correctly to find all friend
words, but takes a little while, probably about 30 seconds to 1 minute.
Improve the code so that it uses Python multiprocessing and 5 parallel calls to check_words so that it gets all the same results (maybe in a different order), but runs faster.
Submit command
To submit files for this homework, run one of these commands:
submit -c=sd212 -p=hw24 friends.py
club -csd212 -phw24 friends.py