SD 212 Spring 2023 / Labs


This is the archived website of SD 212 from the Spring 2023 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

Lab 4: Info Challenge

1 Overview

We will be participating in the Info Challenge hosted at UMD, in groups of 3 or 4.

Each team will focus on a single data set by a real provider, and spend the week understanding, cleaning, analyzing, and building an effective presentation from that data.

1.1 Learning goals

  • Practice the full data science pipeline: acquisition, storage, processing/cleaning, analysis, and visualization/communication
  • Work on a real data set towards the goals of industrial or government organizations
  • Work within a team

2 Schedule and structure

(Submission deadlines in bold)

  • Early February: Choose teams

  • Friday February 24: Comp day (no class)

  • Saturday February 25: Kickoff day, at USNA

    Meet mentors, learn about datasets, and get started

  • Monday February 27: Work on IC during class

  • Tuesday February 28: Work on IC during lab

  • Wednesday March 1: Work on IC during class

  • Thursday March 2 at noon: 250-word abstracts due to IC judges

  • Friday March 3: Comp day (no class)

  • Saturday March 4 at 9am: Project files and presentations uploaded to GitHub and link submitted: instructions and URL submission form

  • Saturday March 4: Travel to UMD for final presentations and awards

  • Monday March 6 at 2359: SD212 submission due

  • Tuesday March 7: Presentations and recap during lab

3 Grading

Your work will be judged by the IC judges for prize consideration. It will also count as a lab grade for SD212, independently of the IC contest judging.

Your SD212 grade will be based on the Info Challenge judging rubric, as scored by your instructor based on what you submit in the Markdown file, your code in GitHub, and your presentations during lab

  • 80%: Info Challenge judging rubric, as scored by your instructor based on:

    1. Your answers to the questions in the markdown file (below)
    2. Your code uploaded to GitHub
    3. Your presentation during lab time
  • 20%: Individual teamwork score based on teamwork rubric completed by all group members.

Your grade may be adjusted down by up to -25% for failure to follow instructions and meet required deadlines.

4 Questions

Please answer and have one team member (only) submit these questions prior to the SD212 submission deadline.

  1. Who are your team’s members?

  2. Enter the URL of the GitHub repository that contains your code and presentation materials.

  3. Briefly describe the file organization in your GitHub repository, Where is your presentation? Where is your code and what does it do?

  4. Say a few words about how your team worked together. Who took on the role as “team manager”? How did you organize and share your work?

  5. For the Info Challenge project, what outside data source(s) did you incorporate?

  6. What did you have to do to clean and processes the data?

    (Include the provided datasets as well as any outside data that you found in your discussion. Just a few sentences giving the overall idea is fine.)

  7. What did you do to analyze the data?

    (Again, just a few sentences with an overall description is good.)

  8. How did you create visualizations of your analysis?

  9. What concrete recommendations or conclusions did you make?

  10. What tips and suggestions do you have for next year’s Info Challenge participants?

4.1 Markdown file to fill in

Here is the file with questions to fill in and submit for today’s lab: lab04.md

You can run this wget command to download the blank md file directly from the command line:

wget "http://roche.work/courses/s23sd212/lab/md/lab04.md"

4.2 Submit your work

Submit the markdown file (with the girhub link to all of your work):

submit -c=sd212 -p=lab04 lab04.md

or

club -csd212 -plab04 lab04.md

or use the web interface