Units
The lectures are broken into 11 units, as shown below. These pages are also reachable from the calendar.
- Unit 1: Welcome back (Classes 1–3)
Course overview, Data science pipeline, Python review - Unit 2: Command line (Classes 4–7)
Files and directories, bash commands, Piping and redirection - Unit 3: Regular expressions (Classes 8–11)
Regex syntax, Python re, Command-line tools - Unit 4: Error handling (Classes 12–14)
try/except, return codes, bash if statement - Unit 5: Versions and packaging (Classes 15–16)
git, github, pip, mamba - Unit 6: Data cleaning (Classes 17–18)
Missing data, Outliers, Preprocessing, Merging dataframes - Unit 7: Hardware and OS (Classes 19–21)
CPU, Memory hierarchy, Filesystems, Compiled vs Interpreted code, Role of the operating system - Unit 8: Concurrency (Classes 22–26)
Multithreading, Python GIL, Multiprocessing, bash job control - Unit 9: OOP in Python (Classes 27–30)
Operator overloading, Inheritance - Unit 10: Data Ethics (Classes 31–33)
Principles, Case studies - Unit 11: Machine learning with sklearn (Classes 34–37)
Statistical data types, Reading documentation, Classification, Regression, Clustering