SD 212 Spring 2026


Units

The lectures are broken into 10 units, as shown below. These pages are also reachable from the calendar.

  • Unit 1: Welcome back (Classes 1–3)
    Course overview, Data science pipeline, Python review
  • Unit 2: OOP in Python (Classes 4–9)
    Operator overloading, Inheritance
  • Unit 3: Command line (Classes 10–13)
    Files and directories, bash commands, Piping and redirection
  • Unit 4: Regular expressions (Classes 14–16)
    Regex syntax, Python re, Command-line tools
  • Unit 5: Error handling (Classes 17–20)
    try/except, return codes, bash if statement
  • Unit 6: Data cleaning (Classes 21–22)
    Missing data, Outliers, Preprocessing, Merging dataframes
  • Unit 7: Hardware and OS (Classes 23–25)
    CPU, Memory hierarchy, Filesystems, Compiled vs Interpreted code, Role of the operating system
  • Unit 8: Concurrency (Classes 26–30)
    Multithreading, Python GIL, Multiprocessing, bash job control
  • Unit 9: Versions and packaging (Classes 31–34)
    git, github, pip, pixi
  • Unit 10: Data Ethics (Classes 35–37)
    Principles, Case studies