hw31: Reading about sklearn
- Due before the beginning of class on Monday, April 24
In this unit, we have been grasping the terminology of machine learning and seeing how we can apply it with Python’s sklearn library.
While this is a challenging subject to dive into for the first time, it’s also one with a lot of great documentation due to its high popularity. Today’s reading will reinforce what we saw in class about unsupervised learning and give a preview on how supervised learning works as well.
Reading
Python Data Science Handbook Chapter 38: Introducing Scikit-Learn
Read everything until the section on Applications to handwritten digit recognition. Some of it should be familiar and sort of review from class, and some of it will be new.
Questions
Download the file hw31.md to fill in and submit for this homework- How much did you read carefully? - The entire chapter
- The beginning up to the last section on digit recognition
- Some of it
- None of it too carefully
 - (Answer with just the letter of your choice.) 
- What does each row of the input array to a machine learning model represent? - The algorithm parameters
- The algorithm hyperparameters
- A single observation or sample
- A single feature or attribute
- The labels for each observation
 
- What does each column represent? - The algorithm parameters
- The algorithm hyperparameters
- A single observation or sample
- A single feature or attribute
- The labels for each observation
 
- Suppose I have three variables: - A 2D array testedcontaining information about a bunch of drugs that have been tested in the lab,
- A 1D array resultswith an indication (1 or 0) on whether each tested drug was effective, and
- Another 2D array untestedcontaining the same information about a few drugs that haven’t been tried out yet.
 - We want to use machine learning to predict whether each untested drug will be effective. - What kind of machine learning problem is this? - (Select all letters that apply.) - Supervised learning
- Unsupervised learning
- Classification
- Regression
- Clustering
 
- A 2D array 
- In the same setup as the previous problem, complete the code below that would actually do it. There are three missing steps; for the next three problems, you select which line of code should go in for each step. - Here is the incomplete code: - from sklearn.naive_bayes import GaussianNB tested = ... # big 2D array of numbers results = ... # 1D array of 1/0 untested = ... # smaller 2D array of numbers # QUESTION 4 step # QUESTION 5 step # QUESTION 6 step print(predictions)- What line of code should be filled in for - QUESTION 4 STEP?- model = GaussianNB()
- fit = naive_bayes()
- model = np.linspace(-1, 11)
- model = PCA(n_components=2)
 
- What line of code should be filled in for - QUESTION 5 STEP?- model.fit(tested)
- fit.model(results)
- model.fit(tested, untested)
- model.fit(tested, results)
- model.fit(untested, tested)
 
- What line of code should be filled in for - QUESTION 6 STEP?- predictions = model.labels_
- predictions = fit.predict(results)
- predictions = fit.predict(untested, results)
- predictions = model.predict(untested)
- predictions = model.predict(results)
 
Submit command
To submit files for this homework, run one of these commands:
submit -c=sd212 -p=hw31 hw31.md 
club -csd212 -phw31 hw31.md