SD 212 Spring 2024 / Labs


Lab 5: Web survey with dynamic analysis

1 Overview

In today’s lab you are each going to conduct a very un-scientific survey. To do that, you will create web APIs that specify your survey questions, collect survey answers, and create dynamic visualizations that are automatically updated as more responses come in.

Essentially what we are doing is re-creating (some of) the functionality of Google Forms, except that the visuals you create will incorporate data from multiple questions instead of just one at a time.

This is a little different than our previous labs because you are not starting with some known dataset and processing it, but rather you have to create the dataset yourself by collecting survey results. This is valuable for a data scientist because it will give you insight on how “messy” real-world responses can be, as well as give us some good motivation to understand multi-processing which is the next unit we will cover in class.

You will continue to build your skills and confidence using pandas to create and process tabular data and plotly to generate visuals. You will also get experience with a new library, flask, which lets us easily create web APIs.

1.1 Deadlines

  • Milestone: 2359 on Monday, 25 March
  • Complete lab: 2359 on Friday, 29 March

1.2 Learning goals

  • Develop survey questions and visual analysis
  • Implement static and dynamic web APIs
  • Perform real-world data cleaning and error handling based on user inputs
  • Use Python libraries for a variety of data science tasks
  • Experience some benefits and drawbacks of single-threaded processing in a web server context

2 Preliminaries

2.1 Markdown file to fill in

Here is the file with questions to fill in and submit for today’s lab: lab05.md

You can run this wget command to download the blank md file directly from the command line:

wget "http://roche.work/courses/s24sd212/lab/md/lab05.md"

2.2 Heads up: Interaction required

You cannot complete this lab entirely on your own! Part of the lab requires you to complete the surveys of your classmates, and vice-versa. It’s important to have your survey up and running (properly collecting data) by the start of the second week of lab.

3 Getting started with Flask (10 pts)

3.1 Creating and starting the web server

As usual, you will want to make a new directory for this lab inside your sd212 folder. The only Python code you will write for this lab will be in a file called survey.py. Create that file now and copy/paste these contents to get started:

from flask import Flask, jsonify, request, render_template
import jinja2

app = Flask(__name__, template_folder='.')
app.jinja_env.undefined = jinja2.StrictUndefined

@app.route('/hello')
def hello():
    return "YOUR MESSAGE HERE\n"

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=YOUR_PORT_HERE, debug=True, threaded=False)

Make two replacements:

  • Change YOUR MESSAGE HERE to any string you want.
  • Change YOUR_PORT_HERE to a number in the range 10000 up to 19999. You can use any port number that no one else is using, and a convenient way to achieve that is to make it 1 followed by the last 4 digits of your alpha.

Now save the file and make sure you are in the right folder as well as the right mamba environment. Once you’re all set, go ahead and run it.

In the terminal, you should see some messages telling you that Flask is running. This is your small web server, waiting to get some requests.

IMPORTANT: Keep this terminal open and running survey.py the whole time while you are working on this lab. It will keep auto-updating every time you save your survey.py file. Helpfully, it will show you every time someone connects to your server, and also display any errors that occur. If it’s getting messed up, you can kill it by typing Ctrl-C in the terminal.

3.2 Accessing the server from a browser and from the command line

In Flask, every web page that your small web server supports corresponds to a function in Python. When someone accesses that page from another application like a web browser, that browser sends a GET request to your web server, which then runs the code in that function and returns the result.

The URL of your simple “hello” page will be like this:

http://HOSTIP:PORT/hello

The HOSTIP part is the IP address of the machine your Flask server is running on. You can see your current computer’s IP address by running on the command line

hostname -I

Usually this will display multiple IP addresses. The one you want (if you are on the USNA network) is the one that starts with 10.. For example, the IP address for ssh.cs is 10.1.83.73, and the hostname for one of the lab machines in HP101 is 10.60.37.232.

The PORT is of course the number in the range 10000–19999 you selected before.

For example, Dr. Roche has his sample solution for the lab running on ssh.cs on port 11985, so you can view his “hello” page by entering this into your web browser address bar:

http://10.1.83.73:11985/hello

Now try changing that so it’s YOUR IP address and port number. If everything is working, you should be able to enter that URL into a web browser and see your message come back. Try it! You should also be able to see the exact same thing by running the curl command from a (new!) terminal window, like

curl 'http://10.1.83.73:11985/hello'

Try it on your own URL as well as the URL of a friend — you should get their message back of course.

Answer these questions in the markdown file:

  1. What is the URL of your “hello” page?

  2. Look carefully at the terminal running your Flask server when you refresh your “hello” page. The Python terminal should spit out a single line to tell you that that page was just accessed and that everything went smoothly. Copy-paste that line for this question.

  3. Access a classmate’s “hello” page. What is their URL and what is the message that you see?

3.3 Submit what you have so far

Submit your work so far:

submit -c=sd212 -p=lab05 lab05.md survey.py

or

club -csd212 -plab05 lab05.md survey.py

or use the web interface

4 Creating the survey questions (30 pts)

4.1 “questions” page and JSON dictionary

Now it’s time to start creating your survey! The first step is to choose the questions. Your survey can be about anything you want, and for the end of the lab you will need to create a visualization (plotly graph) that somehow incorporates data from all the questions in your survey.

Your survey can have two kinds of questions: multiple-choice and short-answer. You need to have at least one of each kind of question.

(Don’t get too bogged down in what the best questions are just yet; you can change things later before your classmates start taking the survey.)

Your questions will be specified using a JSON dictionary that is set up like this:

  • name” is your name (the survey designer)

  • questions” is a list of dictionaries, one for each question. Each question should itself have two or three entries:

    • key” is a short string for an identifier for that question. This will end up being a key in a python dictionary (hence the name), so pick something short and preferably without spaces, like “food” or “height” or “sleepiness”.

    • prompt” is a string for the prompt of that question (the text that is displayed to the user so they can fill something in).

    • For multiple-choice questions only, “choices” is a nested list of strings for each of the possible choice options.

      (For short-answer questions, don’t include the “choices” key in that dictionary, just the “prompt”.)

To share your survey with the world, you will add a /questions page to your Flask server, that returns a JSON dictionary as specified above.

Because each “page” of your Flask server corresponds to a Python function, this just means adding a function like this:

@app.route('/questions')
def questions():
    questions_dict = {}
    # ... your code to fill in the dictionary as specified above
    return jsonify(questions_dict)

Notice carefully how this works: just like the “hello” page, it’s just a Python function that returns some data. In this case, instead of returning a string, we return a JSON-encoded object by using Flask’s jsonify() function.

(This function is a glorified wrapper around the built-in json.dumps() if you remember that from when we introduced JSON in SD211.)

Save your survey.py and you should see the Flask terminal reload the page automatically. Try navigating to your /questions page in your web browser. If all has gone well, you should see a nice JSON-formatted dictionary as specified, with each question inside its own nested dictionary.

(And if you want an example, see if you can find Dr. Roche’s questions dictionary based on the URL given above.)

4.2 View your survey

JSON is great for computers to understand, but we want humans to take your survey! Because you haven’t taken a web programming class and it’s not the main focus of SD212, we are going to give you an HTML “template” that will display your questions in HTML form.

Download survey-template.html and save it in your folder for this lab as survey-template.html. Or directly on the command-line, you can use wget to download it:

wget "http://roche.work/courses/s24sd212/lab/survey/survey-template.html"

Now you just need to add another function to your survey.py which will load this template for the /survey URL, based on the JSON data you already implemented in the questions() function. Here is the code you can add to your survey.py to make that happen:

@app.route('/survey')
def survey():
    return render_template("survey-template.html", info=questions().get_json())

Now try viewing your survey at http://YOUR_IP:YOUR_PORT/survey

Be sure to keep an eye on your terminal that is running your flask server! If there is a problem with your questions JSON dictionary, like if you forgot to specify the key for some question, or if your choices is not a list of strings, etc, then you might see some helpful error messages there!

Note: the submit button on your survey will definitely not work yet if you click it. That is the next part of the lab!

When your survey looks OK, answer these questions in the markdown file:

  1. What is your survey about?

  2. How many short-answer and multiple-choice questions does your survey have? (Remember, it should have at least one of each kind.)

4.3 Submit what you have so far

Submit your work so far:

submit -c=sd212 -p=lab05 lab05.md survey.py survey-template.html

or

club -csd212 -plab05 lab05.md survey.py survey-template.html

or use the web interface

4.4 Milestone

For this lab, the milestone means everything up to this point, which includes the following auto-tests:

md_part3
md_part4

Remember: this milestone is not necessarily the half-way point. Keep going!

5 Collect the responses (30 pts)

5.1 The request dictionary

The next step is to get the survey submission actually working. This will be another page that your Flask server provides, which means another Python function.

Start with this:

@app.route('/answer', methods=['POST'])
def answer():
    print("vvvvvvvvvvvvvv DICTIONARY vvvvvvvvvvvvvvv")
    print(dict(request.form))
    print("^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^")
    return "Got it"

Note the slight difference in the app.route here where we specify methods=['POST']. That is because, unlike the other Flask pages you are creating, this one needs to actually receive data from a submitted web form.

Now try loading your survey at http://YOUR_IP:YOUR_PORT/survey and actually filling in and submitting it. Two things should happen:

  • In the web browser, you see just the returned message “Got it”.
  • In the Flask terminal, you should see the print statements.

Look carefully at that debugging output in the Flask terminal. It is showing you that request.form is a dictionary with keys for name and anything else you specified as a key for each of your questions, and the values in the dictionary should be whatever you actually typed in on the web page.

Every time you (or someone else) submits your survey, this Python function will be called! Now you need to modify it to actually save the responses into a file.

5.2 Saving to responses.csv

You should modify the function started above so that it saves the survey responses into a CSV file called responses.csv.

Your csv file should pretty much match up with what comes directly off the form. So you just need columns for name, q1, q2, etc.

In order to save some headaches later, you may want to do some basic data processing/cleaning here! Make sure the responses are in the correct format so that you will be able to analyze them later. Make sure they aren’t blank or missing! If something is wrong, you should return a message like “ERROR”.

If the data looks OK and you successfully save it to the csv, then instead return a message like “Thank you NAME. Your response has been recorded” (using the responder’s actual name of course).

Important: You can use pandas or just the built-in open and print to save to the responses.csv file. But make sure you are just adding to the end and not overwriting the previous responses! You should get one more row in the csv every time someone submits the survey.

Hint: You have some figuring-out to do here. You do have experience from prior labs in creating and writing CSV files. But you don’t have a lot of experience of adding to an existing CSV file. You definitely have all the Python skills you need to accomplish this, I promise! If you are getting stuck, please reach out for help.

5.3 Gather some responses

Fill out your own survey. Get some friends to fill it out too! This isn’t a “real” experiment, so it’s OK to have the same person fill it out multiple times to get some more data.

  1. Describe one specific way I could submit your survey form so that it triggers your error checking and does not get recorded in responses.csv.

  2. Enter the names of at least three classmates who successfully filled out your survey.

5.4 Submit what you have so far

Submit your work so far, including the responses csv file:

submit -c=sd212 -p=lab05 lab05.md survey.py survey-template.html responses.csv

or

club -csd212 -plab05 lab05.md survey.py survey-template.html responses.csv

or use the web interface

6 Create a dynamic visualization (30 pts)

The last step is to create one more page in your Flask application that produces a nice visualization (graph) of the survey results.

Your page should be called /visual and it should return a web page for some kind of graph based on the survey results in responses.csv.

I strongly recommend using pandas to load and process the csv, and plotly express to create the graph.

Note: We have used Plotly many times in SD211 and SD212. So far, we usually create the figure and then call .show(), something like:

import plotly.express as px

# ... create a dataframe called df

fig = px.scatter(df, x="year", y="price")
fig.show()

But that’s not what we want here - we don’t want the graph to show up in the survey designer’s web browser wherever you happen to be running Flask from; you want to share this with the world (or at least the small world of the USNA intranet).

So instead of calling fig.show(), in your Flask function you will do

return fig.to_html()

which turns the graph into a web page (HTML) and returns that to the web browser that made the request.

Once this works, you should be able to type

http://HOSTIP:PORT/visual

into a web browser (using your own IP address and port) and see your pretty graph pop up. And if you share that URL with someone else, they should see your graph in their web browser.

Most importantly, the visualization should be dynamic, meaning it is automatically re-generated and kept up-to-date with the most recent survey data. That is: if 10 more people submit the survey and you refresh the /visual page, the graph should actually change in response to those new survey results.

  1. Name at least one classmate who loaded your visualization in their own browser using the URL you sent them.

  2. Give a few sentences on what you think about the actual results of your survey. Did you discover anything interesting? What does your visualization tell us about the data?

    Save an image of your visualization as graph.png and submit that along with everything else for this lab.

  3. Name one of your classmates whose visualization you looked at in your web browser using their URL. Who is the classmate, and say a sentence or two on what you think about the results of their survey.

7 Submit your work

I’m putting these questions last this time to make sure you don’t forget them.

  1. What sources of help (if any) did you utilize to complete this lab?

  2. What did you think of the lab overall?

Save your files and submit everything:

submit -c=sd212 -p=lab05 lab05.md survey.py survey-template.html responses.csv graph.png

or

club -csd212 -plab05 lab05.md survey.py survey-template.html responses.csv graph.png

or use the web interface