hw34: How much could USNA charge for tuition?
- Due before the beginning of class on Wednesday, April 24
References
This homework is based on our current unit on sklearn.
Look back at your notes and examples we saw in class.
Two parts of the sklearn user’s guide might also be useful:
Data
The data for this homework comes from the College scorecard from the U.S. Department of Education.
We have kindly cut down the original data in two ways: by removing thousands of non-numeric or rarely-reported columns, and by removing thousands of trade schools and other institutions not really comparable to typical 4-year colleges.
The (cleaned) dataset is here: schools.csv
Your task
The CSV file you downloaded has information reported about around 1000
colleges and universities in the U.S. All columns except the school name
INSTNM
are numeric.
The last column TUITIONFEE_OUT
, reports
the average (out of state) tuition and fees per year. For four schools,
this data is missing.
Your task is to write a program tuition.py
which uses
Ridge regression in sklearn to predict what the tuition and
fees should be for these four schools.
When I run your code, it should print to the terminal 4 lines containing the name of each school and the predicted tuition and fees amount, like this:
United States Air Force Academy 48261.53989866722
United States Coast Guard Academy 50057.64509707588
United States Naval Academy ?????
United States Military Academy ?????
(Of course, your code will not have ????? but the actual numbers there!)
Regression algorithm
Use ridge regression with the default alpha
value of 1.0.
Scaling
If you run your code and get an error like
AlgWarning: Ill-conditioned matrix (rcond=3.23596e-22): result may not be accurate.
then it means you forgot to scale the matrix. The easiest way to do
this is with a Pipeline
which contains a StandardScaler()
as the first
thing and a Ridge()
as the second thing.
There is an example in the notes for this unit and another example in the documentation here.
Questions
In addition to your code, you also need to complete and submit a short markdown file answering these questions.
Download the file hw34.md to fill in and submit for this homeworkWhat is the predicted tuition for USNA (rounded to the nearest dollar)?
What is the predicted tuition for West Point (also rounded to the nearest dollar)?
What conclusions (if any) can you draw from these results?
Submit command
To submit files for this homework, run one of these commands:
submit -c=sd212 -p=hw34 hw34.md tuition.py
club -csd212 -phw34 hw34.md tuition.py
Download the file hw34.md to fill in and submit for this homework