SD 212 Spring 2024 / Labs


Lab 4: Info Challenge

1 Overview

We will be participating in the UMD Info Challenge, in groups of 3 or 4.

Each team will focus on a single data set by a real provider, and spend the week understanding, cleaning, analyzing, and building an effective presentation from that data.

1.1 Learning goals

  • Practice the full data science pipeline: acquisition, storage, processing/cleaning, analysis, and visualization/communication
  • Work on a real data set towards the goals of industrial or government organizations
  • Work within a team

2 Schedule and structure

(Dates and deadlines in bold)

  • Early February: Choose teams

  • Wednesday February 22: choose dataset

  • Friday February 23: Comp day (no class)

  • Saturday February 24 at 10am in Hopper Hall: Kickoff day, at USNA

    Meet mentors, learn about datasets, and get started

  • Monday February 26: Work on IC during class

  • Tuesday February 27: Work on IC during lab

  • Wednesday February 28: Work on IC during class

  • Thursday February 29 at noon: 250-word abstracts due to IC judges. Submit your abstract here

  • Friday March 1: Comp day (no class)

  • Saturday March 2 at 9am: Project files and presentations uploaded to GitHub and link submitted: URL submission form

  • Saturday March 2: Travel to UMD for final presentations and awards

  • Monday March 4 at 2359: SD212 submission due

  • Tuesday March 5: Presentations and recap during lab

3 Helpful documents

4 Discord

A lot of useful information on the event will be available via discord.

You should sign up for a (free) account if you don’t have one already, and then follow this link to join the discord server for IC24.

You can download the Discord app on any of your devices, or use the web browser interface.

When you first log in, you have to give your USNA email address to a bot, which should then give you access to the IC24 rooms as a participant.

5 GitHub

You should definitely make a GitHub repo to do your work on the Info Challenge! Look back at your notes from our recent unit in class and related homeworks if you need a reminder how to do that.

You should have just one GitHub repo per team. Once a single team member creates their GitHub repo for the info challenge, they can invite their teammates to it (as well as their instructor).

6 Grading

Your work will be judged by the IC judges for prize consideration. It will also count as a lab grade for SD212, independently of the IC contest judging.

Your SD212 grade will be based on the Info Challenge judging rubric, as scored by your instructor based on what you submit in the Markdown file, your code in GitHub, and your presentations during lab

  • 80%: Info Challenge judging rubric, as scored by your instructor based on:

    1. Your answers to the questions in the markdown file (below)
    2. Your code uploaded to GitHub
    3. Your presentation during lab time
  • 20%: Individual teamwork score based on teamwork rubric completed by all group members.

Your grade may be adjusted down by up to -25% for failure to follow instructions and meet required deadlines.

7 Questions

Please answer and have one team member (only) submit these questions prior to the SD212 submission deadline.

  1. Who are your team’s members?

  2. Enter the URL of the GitHub repository that contains your code and presentation materials.

  3. Briefly describe the file organization in your GitHub repository, Where is your presentation? Where is your code and what does it do?

  4. Say a few words about how your team worked together. Who took on the role as “team manager”? How did you organize and share your work?

  5. For the Info Challenge project, what outside data source(s) did you incorporate?

  6. What did you have to do to clean and processes the data?

    (Include the provided datasets as well as any outside data that you found in your discussion. Just a few sentences giving the overall idea is fine.)

  7. What did you do to analyze the data?

    (Again, just a few sentences with an overall description is good.)

  8. How did you create visualizations of your analysis?

  9. What concrete recommendations or conclusions did you make?

  10. What tips and suggestions do you have for next year’s Info Challenge participants?

7.1 Markdown file to fill in

Here is the file with questions to fill in and submit for today’s lab: lab04.md

You can run this wget command to download the blank md file directly from the command line:

wget "http://roche.work/courses/s24sd212/lab/md/lab04.md"

7.2 Submit your work

Submit the markdown file (with the girhub link to all of your work):

submit -c=sd212 -p=lab04 lab04.md

or

club -csd212 -plab04 lab04.md

or use the web interface