SD 212 Spring 2023 / Admin


This is the archived website of SD 212 from the Spring 2023 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

Software Installs and Environment Setup

General Tips

Getting your software and programming tools working correctly can be tedious and annoying, but it’s an important part of the life of a data scientist.

Before you get started in any new install, make sure you have time to complete it. That means having plenty of battery (or an outlet available), a fast internet connection, and nowhere else to be for the next hour.

You will need all of the software packages below. Most of them you should already have from previous classes; come back here and re-install if your laptop gets wiped by ITSD or something stops working.

WSL/Ubuntu

  • Laptop only (already on the lab machines)
  • Should be fine from SD211 except the part below about creating your sd212 directory

Installation

This should be fine from SD211; it is the same as Step 1 in the SD211 setup instructions.

  1. Open a powershell as Windows administrator: hit WindowsKey+R to bring up the run dialog, type powershell, then hit Ctrl+Shift+Enter to run as administrator.

  2. Run this command in the powershell:

    Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux
  3. Restart Windows

  4. Open Software Center and install “Ubuntu 20.04 LTS”

  5. After the install completes, open Ubuntu for the first time. You will be prompted to choose a username and password.

    You must choose your USNA username m2XXXXX as the username.

    Use any simple password. It doesn’t need to be (and probably shouldn’t be) the same as your USNA password, and is only used to install software and other stuff inside Ubuntu on your laptop, so security isn’t a huge concern here. Keep it simple and memorable.

  6. Within the Ubuntu terminal you just opened, run these commands to fix your bash settings:

    sed -i.bak s/"@.h...033.00m.."// ~/.bashrc
    printf "\n\nexport DISPLAY=:0\nexport REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt\n" >> ~/.bashrc

Software update

The software inside WSL/Ubuntu is not updated automatically by ITSD/Windows like everything else on your laptop. You should do this update periodically, at the very least at the start of the semester is a good time.

First, open an Ubuntu terminal from the start menu. Then run these two commands, in order. When asked, enter your simple Ubuntu password.

sudo apt update
sudo apt full-upgrade

USNA SSL certificates

(Should already be done from SD211.)

Run this command from an Ubuntu terminal to so that it plays nice on the USNA network. If prompted, enter your simple Ubuntu password.

curl http://apt.cs.usna.edu/ssl/install-ssl-system.sh | bash

SD212 Directory

Run this command from an Ubuntu terminal so that you get a directory called sd212 which is visible on your desktop as well as within Ubuntu.

winhome=$(wslpath "$(wslvar USERPROFILE)")
mkdir -p "$winhome/Desktop/sd212"
ln -sf "$winhome/Desktop/sd212" ~/sd212

SSH keys

(Should already be done from SD211.)

Setting up SSH keys makes it so that you can easily access the CS department server and lab machines through SSH without having to type your password every time.

Run these commands from an Ubuntu terminal window. On the third step, you may be prompted for a password. That should be your USNA password, not your Ubuntu password.

mkdir -p ~/.ssh
[[ -e ~/.ssh/id_ed25519 ]] || ssh-keygen -t ed25519 -N ''
ssh-copy-id "$USER"@midn.cs.usna.edu

Mamba (was Conda)

Need to do this BOTH on your laptop and on a lab machine or midn.cs

In SD211 we used conda to install Python packages. That works, but it can get really slow at times. So, this semester we’ll use a newer tool called mamba.

Mamba actually does exactly the same things as Conda and uses all the same packages, but it runs way faster. Here’s how to install it.

  1. Download mambaforge

    Open a terminal and run the following to download the mambaforge installer from github:

    wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh

    This will download about 100MB file called Mambaforge-Linux-x86_64.sh into your current directory.

  2. Run the mambaforge installer

    The installer you downloaded is actually a bash script. To run it, type:

    bash Mambaforge-Linux-x86_64.sh

    Follow the prompts when asked. When asked whether you want to initialize conda at the end, type yes.

  3. Close the terminal and reopen a new terminal

    (You have to start a new terminal session for bash to know about the installation you just did.)

  4. Install a new sd212 environment with a bunch of packages

    We will start with all the packages we used in sd211, but may add more later in this semester.

    Run this from the command line in your new terminal:

    mamba create -n sd212 numpy pandas ipykernel matplotlib plotly seaborn scikit-learn opencv bs4 lxml nltk easygui wordcloud openpyxl

    This will need to download a bunch of packages totaling around 400MB. Type Y when prompted and watch it go!

    This might take a minute if you are on a slow connection, but should be much faster than when we used conda instead of mamba!

  5. (Optional) Remove anaconda

    From the command line, run this command to wipe out your old conda stuff since we are using mamba now:

    rm -rf ~/anaconda3

Xming

Xming is a small Windows utility that let’s you display GUIs from WSL or ssh.

Install

  1. Visit https://sourceforge.net/projects/xming/
  2. Download Xming
  3. Run as administrator the file you just downloaded

Running/restarting

Xming should be running all the time on your laptop, before you open VS Code for example. If Xming is running, you will see its little X icon in the system tray on the bottom-right of the start menu.

If not, then maybe Xming got closed or crashed for some reason. You should be able to find the Xming program in the start menu and just click it to start it up again.

VS Code

(This should be already done from SD211; see [step 3 in these instructions][211].)

Installation

  1. Go to https://code.visualstudio.com/ and download for Windows.
  2. Run the installer after it downloads.

Setup WSL

  1. Open VSCode
  2. Open the “Extensions” pane (icon with 4 squares on the left side)
  3. Search for and install the “Remote Development” extension from Microsoft.
  4. After that install completes, click the green icon at the very bottom-left and select “New WSL Window”. It should now say “WSL: Ubuntu 20.04” at the bottom left.
  5. Now open “Extensions” again. Install the Python and Jupyter extensions from Microsoft.

Setup ssh connection to midn.cs

  1. Close VS Code and open a Powershell in Windows. Run this command from Powershell:

    setx DISPLAY "127.0.0.1:0.0"
  2. Close Powershell and start VS Code again.

  3. Open the Remote Explorer (it’s the computery icon on the left side of the window)

  4. In the remote explorer, click the little + sign next to SSH to add a new SSH remote host. In the box that pops up, type

    ssh m2XXXXX@midn.cs.usna.edu -XY
  5. Now you should be able to connect to midn.cs.usna.edu from the remote explorer. When you are connected, a whole new VS Code window will come up and it will say “SSH:midn.cs.usna.edu” in green on the bottom-right.

  6. After you are ssh’d to midn.cs, you have to install the VS Code extensions for Python and Jupyter again.

  7. To go back to your local laptop’s WSL (not SSH), just click the green icon on the bottom-left and select WSL again.