SD 212 Spring 2024 / Homeworks


hw34: How much could USNA charge for tuition?

  • Due before the beginning of class on Wednesday, April 24

References

This homework is based on our current unit on sklearn.

Look back at your notes and examples we saw in class.

Two parts of the sklearn user’s guide might also be useful:

Data

The data for this homework comes from the College scorecard from the U.S. Department of Education.

We have kindly cut down the original data in two ways: by removing thousands of non-numeric or rarely-reported columns, and by removing thousands of trade schools and other institutions not really comparable to typical 4-year colleges.

The (cleaned) dataset is here: schools.csv

Your task

The CSV file you downloaded has information reported about around 1000 colleges and universities in the U.S. All columns except the school name INSTNM are numeric.

The last column TUITIONFEE_OUT, reports the average (out of state) tuition and fees per year. For four schools, this data is missing.

Your task is to write a program tuition.py which uses Ridge regression in sklearn to predict what the tuition and fees should be for these four schools.

When I run your code, it should print to the terminal 4 lines containing the name of each school and the predicted tuition and fees amount, like this:

United States Air Force Academy 48261.53989866722
United States Coast Guard Academy 50057.64509707588
United States Naval Academy ?????
United States Military Academy ?????

(Of course, your code will not have ????? but the actual numbers there!)

Regression algorithm

Use ridge regression with the default alpha value of 1.0.

Scaling

If you run your code and get an error like

AlgWarning: Ill-conditioned matrix (rcond=3.23596e-22): result may not be accurate.

then it means you forgot to scale the matrix. The easiest way to do this is with a Pipeline which contains a StandardScaler() as the first thing and a Ridge() as the second thing.

There is an example in the notes for this unit and another example in the documentation here.

Questions

In addition to your code, you also need to complete and submit a short markdown file answering these questions.

Download the file hw34.md to fill in and submit for this homework
  1. What is the predicted tuition for USNA (rounded to the nearest dollar)?

  2. What is the predicted tuition for West Point (also rounded to the nearest dollar)?

  3. What conclusions (if any) can you draw from these results?

Submit command

To submit files for this homework, run one of these commands:

submit -c=sd212 -p=hw34 hw34.md tuition.py
club -csd212 -phw34 hw34.md tuition.py
Download the file hw34.md to fill in and submit for this homework