hw33: Reading about sklearn
- Due before the beginning of class on Monday, April 22
In this unit, we have been grasping the terminology of machine learning and seeing how we can apply it with Python’s sklearn library.
While this is a challenging subject to dive into for the first time, it’s also one with a lot of great documentation due to its high popularity. Today’s reading will reinforce what we saw in class about unsupervised learning and give a preview on how supervised learning works as well.
Reading
Python Data Science Handbook Chapter 38: Introducing Scikit-Learn
Read everything until the section on Applications to handwritten digit recognition. Some of it should be familiar and sort of review from class, and some of it will be new.
Questions
Download the file hw33.md to fill in and submit for this homeworkHow much did you read carefully?
- The entire chapter
- The beginning up to the last section on digit recognition
- Some of it
- None of it too carefully
(Answer with just the letter of your choice.)
What does each row of the input array to a machine learning model represent?
- The algorithm parameters
- The algorithm hyperparameters
- A single observation or sample
- A single feature or attribute
- The labels for each observation
What does each column represent?
- The algorithm parameters
- The algorithm hyperparameters
- A single observation or sample
- A single feature or attribute
- The labels for each observation
Suppose I have three variables:
- A 2D array
tested
containing information about a bunch of drugs that have been tested in the lab, - A 1D array
results
with an indication (1 or 0) on whether each tested drug was effective, and - Another 2D array
untested
containing the same information about a few drugs that haven’t been tried out yet.
We want to use machine learning to predict whether each untested drug will be effective.
What kind of machine learning problem is this?
(Select all letters that apply.)
- Supervised learning
- Unsupervised learning
- Classification
- Regression
- Clustering
- A 2D array
In the same setup as the previous problem, complete the code below that would actually do it. There are three missing steps; for the next three problems, you select which line of code should go in for each step.
Here is the incomplete code:
from sklearn.naive_bayes import GaussianNB tested = ... # big 2D array of numbers results = ... # 1D array of 1/0 untested = ... # smaller 2D array of numbers # QUESTION 4 step # QUESTION 5 step # QUESTION 6 step print(predictions)
What line of code should be filled in for
QUESTION 4 STEP
?model = GaussianNB()
fit = naive_bayes()
model = np.linspace(-1, 11)
model = PCA(n_components=2)
What line of code should be filled in for
QUESTION 5 STEP
?model.fit(tested)
fit.model(results)
model.fit(tested, untested)
model.fit(tested, results)
model.fit(untested, tested)
What line of code should be filled in for
QUESTION 6 STEP
?predictions = model.labels_
predictions = fit.predict(results)
predictions = fit.predict(untested, results)
predictions = model.predict(untested)
predictions = model.predict(results)
Submit command
To submit files for this homework, run one of these commands:
submit -c=sd212 -p=hw33 hw33.md
club -csd212 -phw33 hw33.md
Download the file hw33.md to fill in and submit for this homework