SD 212 Spring 2023 / Admin


This is the archived website of SD 212 from the Spring 2023 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

12-Week Exam Information

Format

The exam is designed as a 50-minute exam but you may use as much of the 2-hour lab period as needed.

This will be a written exam with no computers/calculators/etc allowed.

The only allowed aid is one study sheet. Here are the restrictions/requirements on this study sheet:

  • One side of a single letter-sized piece of paper.

    (Note, it is OK if the other side is your 6-week study sheet.)

  • Hand-written and prepared by you.

  • Write your name clearly on the top of the sheet. It will be handed in along with your exam (and returned back to you later).

The time you take to look over your notes, think about what to write down, and create your study sheet, is very valuable in studying and preparing for the exam — probably more than the actual info actual info will be useful to you during the exam time. For this reason, simply copying what someone else has on their sheet is probably a waste of your time.

Coverage

The 12-week exam will cover everything in the class so far, mostly focusing on units 6–10.

The best things to review are:

  • Your own notes from class
  • The readings from each unit
  • The homework assignments so far. (These are probably the best guide for the kinds of problems you can expect to see on the exam!)

A brief summary (not exhaustive!) of topics and concepts that may appear on the exam is as follows.

  • Computing concepts

    • Major hardware components (Processor, Memory, Disk, I/O)
    • Memory hierarchy: Registers, cache, RAM, disk, cloud
    • Processor instructions (Arithmetic, load/store, branching)
    • What is an operating system and what does it do
    • What is a program vs a process
    • Compiled vs interpreted programs
    • Compiled Python libraries such as numpy and pandas
    • Computing bottlenecks: CPU, memory, I/O
    • Running multiple processes at once
    • Multithreading vs Multiprocessing advantages and disadvantages
    • CPU-bound vs I/O-bound programs
    • Shared memory in multithreading, copied memory in multiprocessing
    • “Real”/“Wall-clock” time vs CPU time
  • Data science concepts

    • Data science pipeline (review from SD211, still important!)
    • Missing data
    • Mis-formatted data
    • Erroneous data
    • Ethical data collection (consent, purpose, compensation, privacy)
    • Ethical data processing and analysis (bias, categorizing people, revealing private information, unintended consequences)
    • Ethical data visualization and communication (fair comparisons, clarity, appropriate visualizations)
  • Python skills

    • Reading and writing to/from csv files
    • Checking for, removing, or replacing missing DataFrame entries
    • Combining multiple dataframes
    • Sorting dataframes by one or more columns
    • (Review from earlier, but still important): Selecting rows or columns of a dataframe based on indexes, names, or simple conditions
    • Writing a multi-threaded program
    • Writing a multi-processing progarm
    • Dividing a computational task into sub-tasks for separate threads or processes
    • Using global variables to communicate between threads
    • Using SimpleQueue for inter-process communication
  • Command-line (bash) skills

    • Data cleaning with sed, grep, cut, pipelines, and regular expressions
    • Viewing processes with ps; understanding PID and PPID
    • Running in the background with & and wait
    • Checking real and CPU execution time with time