SD 212 Spring 2024 / Homeworks


hw19: Author ages

  • Due before the beginning of class on Wednesday, March 6

The Data

Here are two csv files for you to download:

  • banned.csv contains information about 25 commonly-banned books written in the 20th century
  • authors.csv contains information about some 20th century authors

Your task

It seems 38 was a really good age for authors to write famous and controversial novels.

Write a Python program ages.py which shows the titles of books (from banned.csv) whose authors were 38 years old when the book was written, sorted by the year the book was written.

(For the purposes of this assignment, don’t worry about dates within the year; just assume the author’s age is the year the book was published minus the year the author was born.)

For the example files above, running your program should produce 4 titles, like this:

roche@ubuntu$ python3 ages.py
Brave New World
Invisible Man
Catch-22
The Color Purple

For this example, we can see from banned.csv that The Color Purple was written by Alice Walker in 1982, which is 38 years after she was born in 1944 according to authors.csv.

Your code must actually read the csv files and process them (don’t just hard-code the results), and should make good use of Pandas’s built-in tools whenever possible.

Hints

The challenge of this homework is to merge the two CSV files. Here is one way that you can get it done:

  • Read in both CSV files using pd.read_csv, to create two DataFrame variables.
  • The common column to merge on is the author’s name. But the columns with author names have a different title (from the header line) in both CSV files. So change the column title in one of the two so that these column names match up.
  • Call pd.merge to combine the two dataframes based on author names, creating a new DataFrame that has all of the information about each book including the author’s birth year.
  • Subtract the birth year column from the book publication date column, to create a new column age which you add to the dataframe.
  • Sort the dataframe by the year the book was written using Pandas’s sort_values function
  • Select the rows of the dataframe where the age equals 38.
  • Loop over these rows with age 38 using iterrows() and print out each book title.

Again, remember this is just one possible approach of many!

Submit command

To submit files for this homework, run one of these commands:

submit -c=sd212 -p=hw19 ages.py
club -csd212 -phw19 ages.py