Units
The lectures are broken into 14 units, as shown below. These pages are also reachable from the calendar.
- Unit 1: Welcome back (Classes 1–3)
Course overview, Data science pipeline, Python review - Unit 2: Command line (Classes 4–7)
Files and directories, bash commands, Piping and redirection - Unit 3: Statistical data types (Class 8)
Categorical, ordinal, continuous, discrete - Unit 4: Regular expressions (Classes 9–12)
Regex syntax, Python re, Command-line tools - Unit 5: Error handling (Classes 13–15)
try/except, return codes in bash - Unit 6: Data cleaning (Classes 16–18)
Missing data, Outliers, Preprocessing - Unit 7: Info Challenge (Classes 19–20)
- Unit 8: Hardware and OS (Classes 21–23)
CPU, Memory hierarchy, Filesystems, Role of the operating system - Unit 9: Concurrency (Classes 24–27)
Multithreading, Python GIL, Multiprocessing, pickle, shell job control - Unit 10: Data Ethics (Classes 28–29)
Principles, Case studies - Unit 11: OOP in Python (Classes 30–33)
Operator overloading, Inheritance, Naming conventions, Generators - Unit 12: Typing (Classes 34–35)
Type hints, Linters, Static vs run-time checks - Unit 13: Machine learning with sklearn (Classes 36–38)
Reading documentation, Classification, Regression - Unit 14: Versions and packaging (Classes 39–40)
git, pip, conda