This is the archived website of SD 212 from the Spring 2023 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

hw31: Reading about sklearn

Due before the beginning of class on Monday, April 24

Reading
Questions
Submit command

In this unit, we have been grasping the terminology of machine learning and seeing how we can apply it with Python’s sklearn library.

While this is a challenging subject to dive into for the first time, it’s also one with a lot of great documentation due to its high popularity. Today’s reading will reinforce what we saw in class about unsupervised learning and give a preview on how supervised learning works as well.

Reading

Python Data Science Handbook Chapter 38: Introducing Scikit-Learn

Read everything until the section on Applications to handwritten digit recognition. Some of it should be familiar and sort of review from class, and some of it will be new.

Questions

Download the file hw31.md to fill in and submit for this homework

How much did you read carefully?
1. The entire chapter
2. The beginning up to the last section on digit recognition
3. Some of it
4. None of it too carefully
(Answer with just the letter of your choice.)
What does each row of the input array to a machine learning model represent?
1. The algorithm parameters
2. The algorithm hyperparameters
3. A single observation or sample
4. A single feature or attribute
5. The labels for each observation
What does each column represent?
1. The algorithm parameters
2. The algorithm hyperparameters
3. A single observation or sample
4. A single feature or attribute
5. The labels for each observation
Suppose I have three variables:
- A 2D array tested containing information about a bunch of drugs that have been tested in the lab,
- A 1D array results with an indication (1 or 0) on whether each tested drug was effective, and
- Another 2D array untested containing the same information about a few drugs that haven’t been tried out yet.
We want to use machine learning to predict whether each untested drug will be effective.

What kind of machine learning problem is this?

(Select all letters that apply.)
1. Supervised learning
2. Unsupervised learning
3. Classification
4. Regression
5. Clustering
In the same setup as the previous problem, complete the code below that would actually do it. There are three missing steps; for the next three problems, you select which line of code should go in for each step.

Here is the incomplete code:
```
from sklearn.naive_bayes import GaussianNB
tested = ... # big 2D array of numbers
results = ... # 1D array of 1/0
untested = ... # smaller 2D array of numbers

# QUESTION 4 step
# QUESTION 5 step
# QUESTION 6 step

print(predictions)
```
What line of code should be filled in for QUESTION 4 STEP?
1. model = GaussianNB()
2. fit = naive_bayes()
3. model = np.linspace(-1, 11)
4. model = PCA(n_components=2)
What line of code should be filled in for QUESTION 5 STEP?
1. model.fit(tested)
2. fit.model(results)
3. model.fit(tested, untested)
4. model.fit(tested, results)
5. model.fit(untested, tested)
What line of code should be filled in for QUESTION 6 STEP?
1. predictions = model.labels_
2. predictions = fit.predict(results)
3. predictions = fit.predict(untested, results)
4. predictions = model.predict(untested)
5. predictions = model.predict(results)

Submit command

To submit files for this homework, run one of these commands:

submit -c=sd212 -p=hw31 hw31.md 
club -csd212 -phw31 hw31.md

Download the file hw31.md to fill in and submit for this homework

SD 212 Spring 2023 / Homeworks

Reading

Questions

Submit command