SD 212 Spring 2023 / Labs


This is the archived website of SD 212 from the Spring 2023 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

Lab 5: Web survey with dynamic analysis

1 Overview

In today’s lab you are each going to conduct a very un-scientific survey. To do that, you will create web APIs that specify your survey questions, collect survey answers, and create dynamic visualizations that are automatically updated as more responses come in.

Essentially what we are doing is re-creating (some of) the functionality of Google Forms, except that the visuals you create will incorporate data from multiple questions instead of just one at a time.

This is a little different than our previous labs because you are not starting with some known dataset and processing it, but rather you have to create the dataset yourself by collecting survey results. This is valuable for a data scientist because it will give you insight on how “messy” real-world responses can be, as well as give us some good motivation to understand multi-processing which is the next unit we will cover in class.

You will continue to build your skills and confidence using pandas to create and process tabular data and plotly to generate visuals. You will also get experience with a new library, flask, which lets us easily create web APIs.

1.1 Deadlines

  • Milestone: 2359 on Tuesday, 28 March
  • Complete lab: 2359 on Sunday, 2 April

1.2 Learning goals

  • Develop survey questions and visual analysis
  • Implement static and dynamic web APIs
  • Perform real-world data cleaning and error handling based on user inputs
  • Use Python libraries for a variety of data science tasks
  • Experience some benefits and drawbacks of single-threaded processing in a web server context

2 Preliminaries

2.1 Install flask

Get this started while you read about the lab:

mamba activate sd212
mamba install flask

2.2 Markdown file to fill in

Here is the file with questions to fill in and submit for today’s lab: lab05.md

You can run this wget command to download the blank md file directly from the command line:

wget "https://roche.work/courses/s23sd212/lab/md/lab05.md"

2.3 Heads up: Interaction required

You cannot complete this lab entirely on your own! Part of the lab requires you to complete the surveys of your classmates, and vice-versa. It’s important to have your survey up and running (properly collecting data) by the start of the second week of lab.

3 Getting started with Flask (10 pts)

3.1 Creating and starting the web server

As usual, you will want to make a new directory for this lab inside your sd212 folder. The only Python code you will write for this lab will be in a file called survey.py. Create that file now and copy/paste these contents to get started:

from flask import Flask, jsonify, request

app = Flask(__name__)

@app.route('/hello')
def hello():
    return "YOUR MESSAGE HERE\n"

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=YOUR_PORT_HERE, debug=True, threaded=False)

Make two replacements:

  • Change YOUR MESSAGE HERE to any string you want.
  • Change YOUR_PORT_HERE to a number in the range 10000 up to 19999. You can use any port number that no one else is using, and a convenient way to achieve that is to make it 1 followed by the last 4 digits of your alpha.

Now save the file and make sure you are in the right folder as well as the right mamba environment. Once you’re all set, go ahead and run it.

In the terminal, you should see some messages telling you that Flask is running. This is your small web server, waiting to get some requests. It will keep running and auto-updating every time you save your survey.py file. You probably want to keep it running the whole time while you are actively working on this lab. If it’s getting messed up, you can kill it by typing Ctrl-C in the terminal.

3.2 Accessing the server from a browser and from the command line

In Flask, every web that your small web server supports corresponds to a function in Python. When someone accesses that page from another application like a web browser, that browser sends a GET request to your web server, which then runs the code in that function and returns the result.

The URL of your simple “hello” page will be like this:

http://HOSTNAME:PORT/hello

The HOSTNAME part is the machine your Flask server is running on. On a lab machine, this will be something like lnx348942govt, which is probably on a sticker somewhere on your computer, or at the beginning of your terminal prompt, or available by running the hostname command. If you are using SSH to midn.cs, just use csmidn as the hostname.

The PORT is of course the number in the range 10000–19999 you selected before.

If everything is working, you should be able to enter that URL into a web browser and see your message come back. Try it! You should also be able to see the exact same thing by running the curl command from a (new!) terminal window, like

curl 'http://lnx324899govt:19943/hello'

Try it on your own URL as well as the URL of a friend — you should get their message back of course.

Answer these questions in the markdown file:

  1. What is the URL of your “hello” page?

  2. Look carefully at the terminal running your Flask server when you refresh your “hello” page. The Python terminal should spit out a single line to tell you that that page was just accessed and that everything went smoothly. Copy-paste that line for this question.

  3. Access a classmate’s “hello” page. What is their URL and what is the message that you see?

3.3 Submit what you have so far

Submit your work so far:

submit -c=sd212 -p=lab05 lab05.md survey.py

or

club -csd212 -plab05 lab05.md survey.py

or use the web interface

4 Creating the survey questions (30 pts)

4.1 “questions” page and JSON dictionary

Now it’s time to start creating your survey! The first step is to choose the questions. Your survey can be about anything you want, and for the end of the lab you will need to create a visualization (plotly graph) that somehow incorporates data from all the questions in your survey.

Your survey can have just two kinds of questions: multiple-choice and short-answer. You need to have at least one of each kind of question.

(Don’t get too bogged down in what the best questions are just yet; you can change things later before your classmates start taking the survey.)

Your questions will be specified using a JSON dictionary that is set up like this:

  • name” is your name (the survey designer)

  • questions” is a list of dictionaries for each question. Each question should itself have one or two entries:

    • prompt” is a string for the prompt of that question (the text that is displayed to the user so they can fill something in)

    • For multiple-choice questions only, “choices” is a nested list of strings for each of the possible choice options.

      (For short-answer questions, don’t include the “choices” key in that dictionary, just the “prompt”.)

To share your survey with the world, you will add a /questions page to your Flask server, that returns a JSON dictionary as specified above.

Because each “page” of your Flask server corresponds to a Python function, this just means adding a function like this:

@app.route('/questions')
def questions():
    dic = {}
    # ... your code to fill in the dictionary as specified above
    return jsonify(dic)

Notice carefully how this works: just like the “hello” page, it’s just a Python function that returns some data. In this case, instead of returning a string, we return a JSON-encoded object by using Flask’s jsonify() function.

(This function is a glorified wrapper around the built-in json.dumps() if you remember that from when we introduced JSON in SD211.)

Save your survey.py and you should see the Flask terminal reload the page automatically. Try navigating to your /questions page in your web browser. If all has gone well, you should see a nice JSON-formatted dictionary as specified, with each question inside its own nested dictionary.

4.2 View your survey

JSON is great for computers to understand, but we want humans to take your survey! Because you haven’t taken a web programming class and it’s not the main focus of SD212, we have a tool provided for you that will turn your JSON questions into a very primitive-looking web form.

Go to this page and fill in your hostname and port as before and click submit. If all goes well, you should see your very own survey questions and spaces to fill in the answers. More likely you may have some errors — read the error messages and debug your survey.py accordingly!

(Hint: Try using csfaculty as the server name and 11985 as the port to see Dr. Roche’s sample survey in action.)

(Note: the submit button on your survey will definitely not work yet if you click it. That is the next part of the lab!)

When it looks OK, answer these questions in the markdown file:

  1. What is your survey about?

  2. How many short-answer and multiple-choice questions does your survey have?

4.3 Submit what you have so far

Submit your work so far:

submit -c=sd212 -p=lab05 lab05.md survey.py

or

club -csd212 -plab05 lab05.md survey.py

or use the web interface

4.4 Milestone

For this lab, the milestone means everything up to this point, which includes the following auto-tests:

md_part3
md_part4

Remember: this milestone is not necessarily the half-way point. Keep going!

5 Collect the responses (30 pts)

5.1 The request dictionary

The next step is to get the survey submission actually working. This will be another page that your Flask server provides, which means another Python function.

Start with this:

@app.route('/answer', methods=['POST'])
def answer():
    print("vvvvvvvvvvvvvv DICTIONARY vvvvvvvvvvvvvvv")
    print(dict(request.form))
    print("^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^")
    return "Got it"

Note the slight difference in the app.route here where we specify methods=['POST']. That is because, unlike the other Flask pages you are creating, this one needs to actually receive data from a submitted web form.

Now try loading your survey from here and actually filling in and submitting it. Two things should happen:

  • In the web browser, you see just the returned message “Got it”.
  • In the Flask terminal, you should see the print statements.

Look carefully at that debugging output in the Flask terminal. It is showing you that request.form is a dictionary with keys like name, q1, q2, q3, etc., and corresponding values depending on whatever was typed into the form.

Every time you (or someone else) submits your survey, this Python function will be called! Now you need to modify it to actually save the responses into a file.

5.2 Saving to responses.csv

You should modify the function started above so that it saves the survey responses into a CSV file called responses.csv.

Your csv file should pretty much match up with what comes directly off the form. So you just need columns for name, q1, q2, etc.

In order to save some headaches later, you may want to do some basic data processing/cleaning here! Make sure the responses are in the correct format so that you will be able to analyze them later. Make sure they aren’t blank or missing! If something is wrong, you should return a message like “ERROR”.

If the data looks OK and you successfully save it to the csv, then instead return a message like “Thank you NAME. Your response has been recorded” (using the responder’s actual name of course).

Important: You can use pandas or just the built-in open and print to save to the responses.csv file. But make sure you are just adding to the end and not overwriting the previous responses! You should get one more row in the csv every time someone submits the survey.

5.3 Gather some responses

Fill out your own survey. Get some friends to fill it out too! This isn’t a “real” experiement, so it’s OK to have the same person fill it out multiple times to get some more data.

  1. Describe one specific way I could submit your survey form so that it triggers your error checking and does not get recorded in responses.csv.

  2. Enter the names of at least three classmates who successfully filled out your survey.

5.4 Submit what you have so far

Submit your work so far, including the responses csv file:

submit -c=sd212 -p=lab05 lab05.md survey.py responses.csv

or

club -csd212 -plab05 lab05.md survey.py responses.csv

or use the web interface

6 Create a dynamic visualization (30 pts)

The last step is to create one more page in your Flask application that produces a nice visualization (graph) of the survey results.

Your page should be called /visual and it should return a web page for some kind of graph based on the survey results in responses.csv.

I strongly recommend using pandas to load and process the csv, and plotly express to create the graph.

Note: We have used Plotly many times in SD211 and SD212. So far, we usually create the figure and then call .show(), something like:

import plotly.express as px

# ... create a dataframe called df

fig = px.scatter(df, x="year", y="price")
fig.show()

But that’s not what we want here - we don’t want the graph to show up in the survey designer’s web browser wherever you happen to be running Flask from; you want to share this with the world (or at least the small world of the USNA intranet).

So instead of calling fig.show(), in your Flask function you will do

return fig.to_html()

which turns the graph into a web page (HTML) and returns that to the web browser that made the request.

Once this works, you should be able to type

http://HOSTNAME:PORT/visual

into a web browser (using your own hostname and port) and see your pretty graph pop up. And if you share that URL with someone else, they should see your graph in their web browser.

Most importantly, the visualization should be dynamic, meaning it is automatically re-generated and kept up-to-date with the most recent survey data. That is: if 10 more people submit the survey and you refresh the /visual page, the graph should actually change in response to those new survey results.

  1. Name at least one classmate who loaded your visualization in their own browser using the URL you sent them.

  2. Give a few sentences on what you think about the actual results of your survey. Did you discover anything interesting? What does your visualization tell us about the data?

    Save an image of your visualization as graph.png and submit that along with everything else for this lab.

  3. Name one of your classmates whose visualization you looked at in your web browser using their URL. Who is the classmate, and say a sentence or two on what you think about the results of their survey.

7 Submit your work

I’m putting these questions last this time to make sure you don’t forget them.

  1. What sources of help (if any) did you utilize to complete this lab?

  2. What did you think of the lab overall?

Save your files and submit everything:

submit -c=sd212 -p=lab05 lab05.md survey.py responses.csv graph.png

or

club -csd212 -plab05 lab05.md survey.py responses.csv graph.png

or use the web interface