Final Exam Review Problems
Unit 2: Command line
Readings/notes pageWhat input lines would be matched with the following command:
grep -E '\bis$'
- That is
- yeah it tis
- is this right
- what is?
- not is
Answer
a,e
How would you move the file ‘image.jpg’ into a new subdirectory called images?
move images image.jpg
mv image.jpg /images/
mv image.jpg images/
move image.jpg /images
Answer
c
Which of the following pipelines correctly grabs the first 7 lines of a file ‘book.txt’ and counts the number of times the word ‘the’ appears?
head book.txt | grep -c 'the'
head -n 7 book.txt | grep -c 'the'
cut -n 7 book.txt | grep -c 'the'
head -n 7 book.txt | grep 'the'
Answer
b
Suppose we have a file called ‘midshipmen.csv’, where each line has some information about midshipmen, including their alpha (starting with m). Write a bash command that counts the number of youngsters in this file.
Answer
.
grep -E ',m25[0123456789]{4},' midshipmen.csv | wc -l
.
"Year","Sex","Rank","Name","Count","Data_Revision_Date" 2000,Female,2,ASHLEY,2815,11/07/2022 2000,Female,3,SAMANTHA,2576,11/07/2022 2000,Female,3,JESSICA,2467,11/07/2022 2000,Female,5,JENNIFER,2256,11/07/2022 2000,Female,6,ALYSSA,2003,11/07/2022 2000,Female,7,HANNAH,1849,11/07/2022 2000,Female,3,SARAH,1847,11/07/2022 2000,Female,3,ELIZABETH,1830,11/07/2022 2000,Female,10,ALEXIS,1825,11/07/2022
In this CSV called ‘babies.csv’ how would you use bash script to find how many names from this data have a count of 3
Answer
.
grep ',3,' babies.csv |wc -l
Write a bash script that takes the 5th through 8th lines of a file
book.txt
and places them in a new file callednew.txt
. (You should include the 5th and 8th line)Answer
.
head -n 8 book.txt | tail -n 4 > new.txt
Unit 3: Statistical data types
Readings/notes pageSuppose you and your friend are racing each other in a school zone. What type of statistical data best describes how fast you were traveling?
- ordinal data
- discrete numerical data
- continuous data
- nominal numerical data
Answer
What kind of statistical data type can be used to explain Order Of Merit (OOM)?
- numerical and continuous
- numerical and discrete
- categorical and nominal
- categorical and ordinal
Answer
d
What kind of data is your Overall Order of Merit (OOM) number?
- Ordinal
- Categorical
- Nominal
- Numerical
- Both A and B
- B and D
Answer
E
Use a csv of car information to print out a dataframe of model names, one categorical data type, and one discrete numerical data type sorted in descending order by the discrete numerical data.
model,type,seats,drivetrain,msrp,horsepower,mpg_city,weight rainier,suv,7,all,37895.0,275.0,15.0,4600.0 rendezvous,suv,5,front,26545.0,185.0,19.0,4024.0 century,sedan,5,front,22180.0,175.0,20.0,3353.0 lesabre,sedan,5,front,26470.0,205.0,20.0,3567.0 regal,sedan,4,front,24895.0,200.0,20.0,3461.0
Answer
.
cars = pd.read_csv('cars2.csv') new = cars.drop(columns=['type','msrp','horsepower','mpg_city','weight']) print(new.sort_values(by='seats',ascending=False))
Why is it important to understand what type of data you are working with in regards to analysis and visualization of data?
Answer
When working with data, it is important to understand what statistical type there are so that it can be properly used and analyzed. For example, categorical data can be extremely misleading when arithmetic/math are done on them (averages, adding, comparing, etc.). Without proper knowledge of whether data is categorical or numerical, math may be performed on the wrong types of data, resulting in highly misleading conclusions. Similarly, we run into similar problems when working with visualizations. For example, nomimal and numerical data can be confused, leading to graphs that imply a different connection than what is truly present. This is common when finding a correlation in a trend line graph because of an attempt to organize non-ordinal or numerical data in a certain way. With the knowlege of statistical types, you may be more inclinced to use a different form of graph to remove confusion.
What Python command can you use to determine how many distinct values exist and how many times they are repeated?
(Challenge: answer it using command line)
Answer
Python:
.value_counts()
Command Line:
sort | uniq -c
Unit 4: Regular expressions
Readings/notes pageWhich of the choices will not match with the regular expression
N(AV*|OS*)Y
? (Select all that apply)- NAVY
- NAVVY
- NAAVVY
- NOSSY
- NOSY
Answer
C
Which of the following regular expressions would match the name of
India.Arie
? (Select all that apply.)India.Arie
India\.Arie
.*
[A-Za-z]*[.][A-Za-z]*
[^x]+
Answer
All of them: a, b, c, d, e
Which one of these is a mismatch for the given regex example
mid$
?- mid
- humid
- sponsor mid
- middle
Answer
d
Write a Python program to count how many 800 numbers like
1-800-XXX-XXXX
are in a file calledtelephone.txt
.Answer
.
import re f = open('telephone.txt') count = 0 for line in f: for n in re.findall(r'\b1-800-[0-9]{3}-[0-9]{4}\b'): count += 1 print(count)
Write a bash script with regular expressions that loops through a folder coinatining .txt files, and counts the number of files that contain the word Goat.
(Upper or lowercase, but only count whole words. So GOAT and goat should both count, but not goatee or scapegoat.)
Answer
.
for file in *.txt do grep -E -i -m 1 '\bgoat\b' $file done | wc -l
Write a bash command that would change the format of a date from 02/01/2023 to 2023-02-01 in a file called
dates.txt
Answer
.
sed -i 's/\([0-9]\{2\}\)\/\([0-9]\{2\}\)\/\([0-9]\{4\}\)$/\3-\1-\2/' dates.txt
Unit 5: Error handling
Readings/notes pageWhat does the following bash script do?
if grep 'CLASSIFIED' file.txt then echo "REDACTED" else cat file.txt fi
- Replaces the word CLASSIFIED with the word REDACTED in file.txt
- Turns file.txt into a cat
- Prints
REDACTED
if the file contains the word CLASSIFIED, and otherwise displays the contents of the file - Prints
REDACTED
if the file does NOT contain the word CLASSIFIED, and otherwise displays the contents of the file
Answer
c
What is the purpose of error handling in programming?
- To intentionally cause errors in a program.
- To ignore any errors that may occur in a program.
- To prevent a program from crashing when errors occur.
- To make a program run faster.
Answer
Answer: C) To prevent a program from crashing when errors occur.
Write a Python function
first_line(fname)
that takes a string for the name of a file, and returns a string for the first line of that file. If the file does not exist, your function should return an empty string.Answer
.
def first_line(fname): try: f = open(fname) except FileNotFoundError: return '' fline = None for line in f: fline = line break return fline
Write a Python function called
divide
that takes two argumentsa
andb
, and returns the result of dividinga
byb
. However, ifb
is equal to zero, the function should raise aValueError
with the message “Cannot divide by zero”.Answer
.
def divide(a, b): try: return a / b except ZeroDivisionError: raise ValueError("Cannot divide by zero")
Unit 6: Data cleaning
Readings/notes pageGiven a dataframe with missing values, what pandas method will remove any NaN values and replace them with a value:
pd.fillna
pd.dropna
pd.subna
pd.isnull
Answer
pd.fillna
What is the proper format to combine two dataframes together?
pd.merge(df1,df2, on = 'name')
df1.merge(df2, on = 'name')
pd.combine(df1,df2, on = 'name')
df1.combine(df2, on = 'name')
Answer
pd.merge(df1,df2, on = 'name')
What syntax would you use to set the index of a dataframe, df, according to one of its columns entitiled ‘names’?
set_index(df['names'])
df.index(['names'])
df.set_index('names')
Answer
c
What would this command do?
cut -d ',' -f3 planes.csv | sed "s/blue/red/g"
- delete the 3rd column of planes.csv and replace all “blue” with “red”
- pull only the 3rd row of planes.csv and replace all “blue” with “red”
- pull only the 3rd column of planes.csv and replace all “blue” with “red”
- delete all occurrences of “blue” and “red”, then only take out the 3rd column
Answer
c
How does one drop all columns with more than two NaN values?
- df.dropna()
- df.dropna(thresh=3)
- df.dropna(how=‘all’)
- df.dropna(thresh=2,axis=1)
Answer
D
How would you merge three dataframes together while keeping all of the rows intact.
Answer
.
left2.join([right2, another], how='outer')
Given
place.csv
:Abbreviation,State Name,population AL,Alabama,10000 AK,Alaska,100000 AZ,Arizona,10000 AR,Arkansas,100000 CA,California,52000 ...
and given
crime.csv
:state, crime rate, deaths AL,.32,18 AK,.12,40 AZ,.68,13 AR,.22,8 CA,.47,78 ...
Using these two csv files, creating a dataframe which has two additional columns: one labeled as “crimes” which is the number of crimes committed for each state’s population (hint: crime rate times population) and the other one labeled as ‘death rate’ (deaths divided by population).
Answer
.
place = pd.read_csv('place.csv') new = place.rename(columns={'Abbreviation': 'state'}) crime = pd.read_csv('crime.csv') df = pd.merge(new,crime, on = 'state') df['crimes'] = df['crime rate'] * df['population'] df['death rate'] = df['deaths'] / df['population'] print(df)
Given two dataframes:
data1 = Missouri Alabama Oregon a NaN NaN 8.0 c 9.0 10.0 10.0 e 13.0 14.0 12.0 g 12.0 5.0 8.0
data2 = OH NV NY a 1.0 2.0 Nan c 3.0 Nan 9.0 e NaN 6.0 11.0
replace all Nan in Ohio with 1.0 and other states with 6.0, and combine these dataframes to made one big dataframe called
data_comb
. We also have some data on New Jersey:[4.0,5.0]
. Add this to the dataframe.Answer
.
import pandas as pd data_comb = pd.merge(data2,data1, how='outer',left_index=True, right_index=) data_comb.rename(columns={'OH':'Ohio','NV':'Navada','NY':'New York'}) data_comb.fillna({'Ohio': 1.0,'Nevada':6.0,'New York':6.0,'Missouri':6.0,'Alabama':6.0}) New_Jersey = pd.DataFrame({'a':4.0,'c':5.0,'e':6.0,'g':6.0}) data_comb = pd.concat([data_comb,New_Jersey], axis=1)
It should look like this:
Ohio Nevada New York Missouri Alabama Oregon New Jersey a 1.0 6.0 8.0 1.0 2.0 6.0 4.0 c 9.0 10.0 10.0 3.0 6.0 9.0 5.0 e 13.0 14.0 12.0 6.0 6.0 11.0 6.0 g 6.0 6.0 6.0 12.0 5.0 8.0 6.0
Two csv files:
clothes.csv: coolness.csv: item,size item,coolpoints Fortnite shirt,M Fortnite shirt,800000 Emoji pants,L Emoji pants,0.33333333 Bronies hoodie,XXXL Bronies hoodie,911
Write a short python program to join clothes.csv and coolness.csv to a single DataFrame with 4 rows and three columns (item, size, and coolpoints), and print out that merged DataFrame.
Answer
.
import pandas as pd clothes = pd.read_csv('clothes.csv') cool = pd.read_csv('coolness.csv') bigdf = pd.merge(clothes,cool, on= ['item']) print(bigdf)
Given a dataframe, df, with 5 columns, two of the columns, ‘name’ and ‘shape’, contain with nullvalues. Reorganize the dataframe so that there all of the NaN values are set to 0, and the data is indexed based on the second column entitled ‘color’.
Answer
.
df[['name','shape]] = df[['name','shape]].fillna(0) df = df.set_index('color')
Unit 8: Hardware and OS
Readings/notes pageWhich of the following processor instructions might be required to execute a line of Python code like
x = y + 2
? Select all that apply.- Arithmetic instruction to do the addition with
+
- Arithmetic instruction to do the comparison with
=
- Load instruction to look-up the value of
x
- Load instruction to look-up the value of
y
- Store instruction to save the value of
x
- Store instruction to save the value of
y
- Control flow instruction to perform the assignment
- Logic instruction to determine the type
Answer
a, d, e
- Arithmetic instruction to do the addition with
Which aspect of the memory hierarchy is considered secondary storage?
- registers
- caches
- flash disk
- main memory
Answer
c
Why do we have the memory hierarchy with faster and slower parts? Why not just store everything in the fastest type of storage like cache or registers?
Answer
The faster parts of the memory hierarchy like registers and cache are also very expensive in terms of power, size, and/or power consumption, so their capacity is limited. There are typically only a few bytes of register storage available, for example. The slower parts of memory hierarchy such as disk are also very cheap, so they can have huge capacity like terabytes of data.
What is one advantage and disadvantage of compiled languages?
Answer
An advantage is that compiled languages tend to be more efficient at run-time after compilation while a disadvantage is that compiled languages can help spot errors at the initial compilation stage before the program is actually run.
Unit 9: Concurrency
Readings/notes pageWhich is true regarding Multiprocessing? Select all corect answer(s):
- Each process has a copy of global variables
- Not effective for CPU-intensive tasks because of the GIL
- Affected by global interpreter lock
- Works well for IO-bound tasks
Answer
a,d
Which command pauses a process for a given number of seconds?
sleep
wait
kill
ps -A
Answer
the answer is a, Where the sleep command pauses a process for a given number of seconds.
Which of the following is NOT true about multithreading?
- Can effectively use multiple CPU cores
- Works well with IO bound tasks
- Affected by global interpreter lock, GIL prevents multiple threads
- Each thread has shared access to the SAME global variable
Answer
a
What is the PID?
- Process Identifier
- Process in Disguise
- Process inside Disk
- Penguin Identifying as a Dog
Answer
a
A popular theory among Swifties is that the second song of each Taylor Swift album is one of her best songs. Using the list of Taylor Swift albums, each album made up of its own list of songs, write a multithreaded album that picks the second song from each album, adds it to a new list, then prints that list in alphabetical order.
TaylorSwift = ['Tim McGraw', 'Picture to Burn', ...] Fearless = ['Fearless', 'Fifteen', 'LoveStory'...] albums = [TaylorSwift, Fearless, SpeakNow,...Midnights] # list of all albums
Answer
.
from threading import Thread my_list = [] def get_song(album): global my_list my_list.append(album[1]) if __name__ = '__main__': children = [] for album in albums: child = Thread(target = get_song, args = [album]) child.start() children.append(child) for child in children: child.join() print(sorted(my_list))
Multi-thread to retrieve 15 random car facts from an api
Answer
.
import requests from threading import Thread requests.packages.urllib3.disable_warnings() link = ?????? carfacts = [] def carinfo(): global facts resp = requests.get('link', verify=False) carfact = resp.json()['text'] carfacts.append(carfact) if __name__ == '__main__': children = [] for _ in range(15): child = Thread(target=carinfo, args=[]) child.start() children.append(child) for child in children: child.join() for carfact in carfacts: print(carfact)
Write a multiprocess program for executing a function
function
that is needed to run for range of 0 to 1000000 times and takes arguments start_value and end_value.Answer
.
from multiprocessing import Process children = [] start_value = 0 for x in range(250000, 1000000, 250000): child = Process(target=function, args=[start_value, x]) child.start children.append(child) start_value += 250000 for child in children: child.join() print("Done")
You are given four csv files (usna.csv, usma.csv, usafa.csv, uscga.csv) from the different service academies, containing phone usage data. Each csv is formatted as shown below.
------usna.csv------- name,app,apptype,minutes Tim,tiktok,entertainment,45 Sam,instagram,entertainment,30 Peter,googledrive,academic,120
Write a Python program that calculates the total hours students from all academies spend on non-academic apps in one day.
Answer
.
from threading import Thread import pandas as pd totalmin = 0 def total_min(fnames): global totalmin df = pd.read_csv(fnames) nonacademic_df = df[df["apptype"] != "academic"] for index,row in nonacademic_df.iterrows(): totalmin = totalmin +row['minutes'] if __name__ == "__main__": children = [] files = ['usna.csv','usma.csv','usafa.csv','uscga.csv'] for fname in files: child = Thread(target=total_min, args = [fname]) child.start() children.append(child) for child in children: child.join() hours = totalmin // 60 mins = totalmin % 60 print(hours, "hours", mins, "mins")
Unit 10: Data Ethics
Readings/notes pageIt is ethical to take data from sources you were not authorized to use.
- True
- False
Answer
b False
Which of the following is NOT a primary tenet of data ethics?
- Promote transparency
- Hold oneself and others accountable
- Avoid using large data sets
- Stay informed of developments in the fields of data management and data science
Answer
C
It is the year 2075, and in preparation for your big 50 year reunion, the Commandant has given you access to a master file that includes major life updates such as birth of children, marriages, and deaths for the class of 2025. You decide to make a slideshow presentation featuring some of the highlights of the data you’ve found (such as 50% of those married within a month of graduation are now divorced). What are two ethical dilemmas that this situation presents?
Answer
One ethical problem that comes up is lack of transparency for those whose data it is. Some people might not be ok with personal information like the frequency of the birth of their children to be a factor in public data being displayed. Another issue that could present itself if the possibility of this data being accidentally released to the public population and creating a bias for Midshipman graduates. Negative biases could negatively affect application rates or cause alumni to have a harder time getting jobs.
You have a source of data of personal information, but the data that you need can’t be placed on one person (blood type, etc.). Can you use this data? Or what should you do in order to use this data?
Answer
You should try to get in contact with the people who gave thier information and ask if they can use the data. If they say yes, feel free to use it.
Unit 11: OOP in Python
Readings/notes pageIf
x
is located somewhere in our code, what wouldbool(x)
return?- would return the “type” of
x
- would return either true or false
- either a 0 or 1 value
- you would recieve an error message
Answer
b, or possibly (d) if the type of
x
does not allow it to be converted to a true/false.- would return the “type” of
What is the difference between a class and an instance variable?
- They are interchangable
- A class variable is shared throughout the class and an instance variable applies only to each unique instance of a class.
- A class variable applies only to each unique instance of a class and an instance variable is shared throuhgout the class.
- A class variable is used inside the class and instance variables are used outside the class.
Answer
b
Which of the following are NOT an object(s)?
1234
[5,6,7]
if y:... else:...
df.sort_values(by='year')
d[7]
Answer
c
Which of the following describes the term “method” in regards to Object Oriented Programming?
- A variable that is part of a class
- A function that is contained within a class and the objects that are contructed from the class
- A constructed instance of a class
- A new class created when a parent class is extended
Answer
b
Explain what the use of the
__init__
method is in a class and why it is important for a class. What occurs to the objects of the class when there is no__init__
method?Answer
When the
__init__
method is utilized, the arguments that are passed into the call/class must correspond to the__init__
’s parameters, except for the parameter ofself
.__init__
creates attributes of a newly created instance. However, when there is no__init__
method in a class, the class must be called without any arguments in place as the__init__
method that normally accepts parameters is absent. As a result, new instances that are called upon without an__init__
method has no instance specific qualities.For the following example, what would the print statement inform the user about the function “total”? Why can this be beneficial?
class Counter: x = 0 def total(self) : self.x = self.x + 1 print("Adding",self.x) example = Counter() print ("Type", type(example.total))
Answer
The following example would print out:
Type <class 'method'>
.Typically you would not want to do this, but it tells us that
total
is a function in the class that needs to be called likeexample.total()
, rather than a class variable or something else.Construct a class that takes a title of a movie and the length of the movie in minutes. Then, define a class funtion that prints the name and length of the movie in a nice format on different lines.
Answer
.
class Movie: def __init__(self, name, length): self._name = name self._length = length def _show(self): print("Movie Title:", self._name) print("Movie Length:", self._length,"minutes.") spiderman = Movie("spiderman",120) Movie._show(spiderman)
.
class Sport: sport = 'ball sports' def __init__(self, name): self.name = name
What would
two.sport
return?What would
one.name
return?
Answer
‘ball sports’
‘Basketball’
Unit 12: Typing
Readings/notes pageSay we have a function called
funky_function
that takes the arguments words,list of strings, and numb, a float OR an int, and it returns an int. How would one initialize a function using typing?def funky_function(words[str], numb: float | int) -> int
def funky_function(words:list[str], numb: float|int) -> int
def funky_function(words:list[str], numb: float|int): int
def funky_function(words:list[str], numb: float/int) -> int
Answer
Which of the following is NOT a type annotation?
bool
Process
function
None
Answer
c
Using type hints, how would you create a variable “Classyear” which holds the integer 2025?
Classyear = 2025
Classyear: int = 2025
Classyear = 2025: int
Class year = 2025 -> int
Answer
Classyear: int = 2025
Which of the following is the correct type annotation to take in a string (line) and any number (num) and returns None? Make sure that all variables are annotated.
- def show(line:int|float, num:str) -> None
- def show(line:str, num:int)
- def show(int[line], str[num]) -> None
- def show(line:str, num:int|float) -> None
- All of the above
Answer
d
Write a function called
years_to_grad
using type hints, where the only argument is an integer that is the user’s class year, and returns a string that says “Congratulations! You have x years until you graduate!” where x is their class year - 2023.For example:
years_to_grad(2025)
would print:Congratulations! You have 2 years until you graduate!
Answer
.
def years_to_grad(classyear:int) -> str: diff: int = classyear - 2023 message: str = "Congratulations! You have " + str(diff) + " years until you graduate!" print(message)
Given the function below, write type annotations for all variables.
def whichunit(alpha): """Tells which unit to write review questions for based on your alpha.""" if isinstance(alpha, str): alnum = int(alpha[-6:]) else: alnum = alpha return (alnum // 21) % 14 + 1
Answer
.
def whichunit(alpha: int | str) -> int: """Tells which unit to write review questions for based on your alpha.""" if isinstance(alpha, str): alnum: int = int(alpha[-6:]) else: alnum = alpha return (alnum // 21) % 14 + 1
Write a function called
measure
that takes a string and prints a string saying how many characters long it is in the format: “wow your string is ____ characters long!
” use typing.Answer
.
def measure(word:str)-> None: a = len(word) print(f'wow your string is {word} characters long!')
Write a program that has three things in a class called Ball:
roll(x)
: moves the ball forward by the given amountkick()
: always just moves the ball forward 5print()
: prints the number that the ball is at
if __name__ == '__main__': ball = Ball(6) ball.roll(3) ball.roll(2) ball.kick() ball.print()
Required Output:
Ball starting at 6 Rolled 3 Rolled 2 Kicked The ball is at 16
Answer
.
class Ball: def __init__(self, amt:int) -> None: self.points=amt print("Ball starting at " +str(self.points)) def roll(self, amt:int) -> None : self.points=self.points + amt print("Rolled " +str(amt)) def kick(self) -> None: self.points=self.points + 5 print("Kicked") def print(self) -> None: print("The ball is at " +str(self.points))
Unit 13: Machine learning with sklearn
Readings/notes pageWhat is the correct order to create a pipeline?
python pipe = make_pipeline(Ridge(),StandardScaler()) pipe.predict(X_data,X_label) pipe.fit(X_test)
python pipe = make_pipeline(Ridge(),StandardScaler()) pipe.fit(X_data,X_label) pipe.predict(X_test)
python pipe = make_pipeline(StandardScaler(),Ridge()) pipe.predict(X_data,X_label) pipe.fit(X_test)
python pipe = make_pipeline(StandardScaler(),Ridge()) pipe.fit(X_data,X_label) pipe.predict(X_test)
Answer
d
What is a crucial aspect of the matrices when creating a Ridge regression model?
- the numbers must all be positive
- the data inside the columns should be both numeric and non-numeric
- the data inside the columns should be only numeric
- all the matrices need to be the same size
Answer
c
What kind of algorithm attempts to find distinct groups of data without reference to any labels?
- Regression
- Clustering
- Supervised Learning
- Classification
Answer
- Clustering
Which of the following types of machine learning is defined by its use of labeled datasets to train algorithms that to classify data or predict outcomes accurately?
- Supervised
- Unsupervised
- Cluster
- Reinforcement
Answer
a
Which two models are used in supervised learning?
- Regression and Dimensionality reduction
- Dimensionality reduction and Clustering
- Clustering and Regression
- Classification and Regression
Answer
D
What are regression and classification algorithms? Describe how they are different, and an example of where you would use each.
Answer
Both are examples of supervised learning, used to predict some label value using previously-known labels, based on a common set of features.
Regresssion algorithms are used to determine continuous values (e.g., age or height) while classification algorightms are used to identify different categories within a dataset (e.g., Gender, Classes, Groups).
List one type of model from both supervised learning and unsupervised learning.
Answer
Supervised: Regression Unsupervised: Clustering
Why is scaling dataframes important for machine learning?
Answer
It is important to scale dataframes, in order to normalize the data - it is important to have them more “rounded” per-say. If the dataframes are not normalized, then some columns that just naturally have larger numbers — say, a year which is typically around 2000 or more — will be weighted as more significant than a column that naurally has smaller numbers — say, a GPA which is typically between 0 and 4. Issues can arise with finding a proper linear equation between the data, skewing the results.
What is the difference between supervised and unsupervised learning?
Answer
Supervised learning can predict labels based on labeled training data and unsupervised learning identify structure in unlabeled data.
Given a training dataframe,
X
, a testing dataframeY
, and a vectorV
, write a linear regression model fit to make a prediction on the testing data.Answer
.
model = Ridge() model.fit(X, V) predictions = model.predict(Y)
Unit 14: Versions and packaging
Readings/notes pageCircle which git command is responsible for downloading changes from remote to local repository.
- add
- pull
- clone
- push
Answer
b
To update your local repository to the newest commit, execute
______
in your working directory to fetch and merge remote changes.git add
git pull
git merge
git clone
Answer
git pull
What is the git command to retrieve an entire remote repository and create a working copy?
- Fetch
- Merge
- Commit
- Pull
- Clone
Answer
E
Dr. Timcenko wants to add a file with the name
exam.txt
to a folder calledsec5
along with the message “Go Navy, Beat Army!”. What commands should he type in the command line to commit the changes to the local copy?Answer
.
git add sec5/exam.txt git commit -a -m "Go Navy, Beat Army!"
I have finished working on a coding assignment as want to now put it in my github. Assume I have already set up the repository and just need to run the github commands as this point. I also want to note that this is the third time I am committing these files. What two commands do I need to run to accomplish this?
Answer
.
git commit -a -m "Third commit" git push -u origin main
Please state the correct usage for using github, how collaborating on these pages is beneficial to the coding community, and something we have used from the github page created by the programming community.
Answer
Github is a website created for community collaboration creating code and packages for people of all skill levels to use and collaborate on. It is highly beneficial for coders as we as data scientists need tools such as pandas to parse through data in a more effective and efficient way.