Units
The lectures are broken into 10 units, as shown below. These pages are also reachable from the calendar.
- Unit 1: Welcome back (Classes 1–3)
Course overview, Data science pipeline, Python review - Unit 2: OOP in Python (Classes 4–9)
Operator overloading, Inheritance - Unit 3: Command line (Classes 10–13)
Files and directories, bash commands, Piping and redirection - Unit 4: Regular expressions (Classes 14–16)
Regex syntax, Python re, Command-line tools - Unit 5: Error handling (Classes 17–20)
try/except, return codes, bash if statement - Unit 6: Data cleaning (Classes 21–22)
Missing data, Outliers, Preprocessing, Merging dataframes - Unit 7: Hardware and OS (Classes 23–25)
CPU, Memory hierarchy, Filesystems, Compiled vs Interpreted code, Role of the operating system - Unit 8: Concurrency (Classes 26–30)
Multithreading, Python GIL, Multiprocessing, bash job control - Unit 9: Versions and packaging (Classes 31–34)
git, github, pip, pixi - Unit 10: Data Ethics (Classes 35–37)
Principles, Case studies