Unit 7: Concurrency

1 Overview
2 Resources
3 Bash
4 Python

1 Overview

So far all of the programs and shell scripts we have written are inherently serial: do one thing, then another thing, then another thing, etc. until the job is done.

But this is a wasteful use of the powerful computing resources at our disposal! Every modern computer — even your cell phone — has at least a handful of independently-executing CPU cores. Why should just one of them be doing useful “work” on some data science task?

In this unit, we will get a small glimpse into the world of concurrent programming, where we write code to execute more than one thing at the same time. Starting with bash, we will first learn about how to manage multiple processes can interact on the command line and the “familial” relationships between them. Then we will see how to do the same thing with Python’s multiprocessing library, specifically concurrent.futures.ProcessPoolExecutor.

As with many things we cover in your intro classes, this is not an exhaustive view of the subject of parallel programming! The art of designing efficient concurrent programs is something you could take multiple whole courses on. Our goal here is to introduce the main concepts of multiprocessing, and to see some of the advantages (and dangers) it can provide.

2 Resources

The Linux Command Line Chapter 10: Processes (Required)

This chapter of TLCL goes over how mutliple processes coexist and how to get information about them in Linux. You can gloss over the details in the section on Signals, but otherwise this is all good and useful info for us.

Pay special attention to job control: how to start, stop (pause) and kill a process, and how to execute processes in the background.
Python in a Nutshell Chapter 15: Concurrency

This is not one of our “usual” Python textbooks, but it’s still a good one for the level we are at.

3 Bash

3.1 Processes

A process is a running program. When you write code in Python or another programming language, you’ve created a program. Actually executing/running that program creates a process.

We use a lot of human/family terminology when talking about processes:

Processes have a lifetime. They are created when they start running, then they are alive while running, and then they might either terminate normally or be killed by another command.
Each process has a parent, which is the process that started it. For example, when you open a terminal in VS Code, the code process is the parent of the bash process in the terminal. If you run a command like grep in the terminal, then that bash process will be the parent of grep.
Similarly, if a running program starts another one, that new process is a child of the parent process.
This goes further to things like “orphans” and “widows”, but we won’t get into that for this course.

How can we refer to a process? We could use the command that the process is running, but that’s not enough, since (for example) there could be multiple instances of the same program running at once.

Every process has a unique process identifier or PID. It is a number, usually 6 or 7 digits on modern versions of Linux.
Every process also has a parent, which is the process that launched it. The parent process ID is called the PPID.
The only exception is the process with PID 1, which is the first process started by the OS kernel. On modern Linux, this process is called systemd. It does not have a parent, and its PPID is listed as 0. If you follow the parents-of-parents of any Linux process, you will eventually get back to PID 1.

3.2 Backgrounding

By default, when you run a command in bash like

roche@ubuntu$ wget 'https://www.theinspiration.com/wp-content/uploads/016-3-1074x1074.webp'

the default behavior of bash is to run that program as a new process in the foreground. That means that the program runs in the terminal for as long as it takes, and bash waits patiently for it to finish. When the process is finished, bash steps in again and gives you a new prompt.

Instead, we can add the ampersand & at the end of any command line or after a line in a bash script to run that process in the background. A background process runs just like a foreground process. The difference is with bash itself: instead of waiting for the process to finish, bash instead lets the process go on in the background and immediately gives you a new prompt.

This is our first foray into multiprocessing, where we can get the computer to do multiple things at the same time. When you execute a program in the background in bash (with the &), you can then immediately start another process in the background, and another, etc.

The special command wait will pause the current bash terminal or bash script until all background commands are finished.

3.3 Signals (killing)

Normally, each of the many processes running at a given time are blissfully unaware of the hundreds of other processes that are also competing for limited hardware resources. The operating system partitions memory and schedules the CPU and I/O devices so that each process gets what they need without interference.

But of course sometimes we need to communicate between processes! One way to do this which we have already seen is a bash pipeline, where the output stream of one processes is hooked into the input stream of another one.

In Linux, signals are used to “interrupt” a process, usually triggered by another process or by the OS. There are only a small number of possible signals, with specific meanings, and each with a numeric code. You can type man 7 signal to learn about all of them.

For us, there are two signals we should know about:

SIGINT (2): “Interrupt” a process. This is the signal that gets sent when you type Ctrl-C while a process is running.
SIGTERM (15): Nicely ask to terminate a process. This is the default signal sent with the kill command.
SIGKILL (9): Forcibly terminate (“kill”) a process

The first signal SIGINT (Ctrl-C) can be ignored by a program if it wants to. For example, if you type Ctrl-C from the bash terminal without any program currently running, bash just ignores it and gives you another prompt.

The second one SIGKILL cannot be ignored. If you send a SIGKILL it will end that process immediately, even if it’s in the middle of doing something super-important.

The kill command can be used to send a signal to a specific process. Here is an example of running a wget download command in the background, then using ps to see the PID of that wget command as it runs, and then kill to send a Ctrl-C signal to make wget quit early:

3.4 Useful commands

ps: Get info about currently running processes in the current terminal. Some variants/options:
- ps -A: Get info about all processes, not just the ones in the current terminal
- ps -f: Display a full listing that includes the PPID
- ps 123456: Show info about the process with PID 123456
top: Show a live update of all processes running, sorted by default by CPU usage. (Similar to Windows Task Manager, but more nerdy.)
htop: Better, more colorful version of top
&: Not a command, but can be added to the end of any command or pipeline to make it run in the background
wait: Pause execution until all background processes terminate
sleep: Pause for a given number of seconds
kill: Send a signal (default is SIGTERM) to a process with the given PID
- kill -2 123456: Ahem hello process 123456, I’d like to get your attention. Whoops, did I startle you and cause your death?
- kill -15 123456: Process 123456, please die when you get a free moment.
- kill -9 123456: I’m not asking anymore. You’re dead.

4 Python

In Python we can do something very similar to running a command in the background in bash, by executing a Python function in a separate Python process (multiprocessing).

4.1 Multiprocessing

Consider the following program, which finds and prints out the smallest prime numbers after 100,000, after 200,000, etc.:

def is_prime(p):
    """Returns whether or not p is a prime number."""
    # (Note, this function is intentially slow, for demo purposes!)
    d = 2
    while d < p:
        if p % d == 0:
            # p is divisible by d, so not prime
            return False
        d += 1
    # p has no divisors between 1 and p
    return True

def next_prime(n):
    """Computes and prints out the next largest prime after n."""
    p = n
    while not is_prime(p):
        p += 1
    print(p)

if __name__ == '__main__':
    for x in range(1000000, 10000000, 1000000):
        next_prime(x)

Don’t worry too much about the details of primality checking. But notice the general structure: we are calling a function next_prime repeatedly, with different input arguments, and that function will print out the results of each call.

This is an ideal candidate for multiprocessing! Here is what that same program looks like when each of the function calls to next_prime is run in its own process:

from primes import next_prime
from multiprocessing import Process

if __name__ == '__main__':
    children = []
    for x in range(1000000, 10000000, 1000000):
        child = Process(target=next_prime, args=[x])
        child.start()
        children.append(child)

    for child in children:
        child.join()

    print("All done")

Notice a few components, which will be common for just about every time you want to use multiprocessing in Python:

Need to import the Process class from the multiprocessing library
Always use a if __name__ == '__main__' to run multiprocessing stuff. (Otherwise, you risk each of the children spawning more and more children on some operating systems.)
Use the Process constructor with two named arguments:
- target is the name of the function you want to call
- args is a list or tuple of the arguments that will be passed to that function
Save your Process objects in a list (for later use)
Call the .start() method on each Process. (If you forget this step, then they won’t actually run and your program will do nothing!)
Later, in a separate loop, call the .join() method on each process. This causes the parent process to wait for each child to finish.

(Make sure you understand why it needs to be a separate loop rather than calling .join() right after .start().)

You can run the program above on the command line with the time built-in to test it for yourself. Here it is on my 4-core laptop. First the original version without multi-processing:

roche@ubuntu$ $ time python3 primes.py
1000003
2000003
3000017
4000037
5000011
6000011
7000003
8000009
9000011

real    0m3.281s
user    0m3.269s
sys     0m0.012s

What we see here is first the actual output (the 9 prime numbers), in order as expected. Then the timing results say that this took about 3.3 seconds of “real” time, of which almost all of that was spent with the CPU working (“user” time measures normal CPU usage).

Now look carefully at what happens when we run the multi-processing version instead:

roche@ubuntu$ time python3 primes-mp.py
1000003
2000003
3000017
4000037
5000011
7000003
6000011
8000009
9000011
All done

real    0m1.151s
user    0m5.936s
sys     0m0.047s

There are some key differences here:

It got faster! The real time it took was cut almost in third, to about 1.2 seconds.
The output is not in order. In fact, if I run this again, the order will be slightly different each time. That is because each process is running simultaneously, so they can finish (and print their results) in different orders on different runs.
The CPU time went up. The “user” time measures the TOTAL CPU usage across all processes. Because there is some “overhead” cost for starting a new process, this can be higher than the single-process version. But because the CPU time is much larger than the “real” time, that’s a great indication that our program is taking good advantage of the 4-core CPU in my laptop.

4.2 ProcessPoolExecutor

There is a simpler abstraction over the low-level Process objects in Python, the ProcessPoolExecutor from the concurrent.futures library that is part of standard Python.

This is currently the best way to write multiprocessing code in Python and this is how we recommend you write parallel code in Python for SD212.

There are four main steps to designing a parallel program in Python using ProcessPoolExecutor:

Write a function to do some “chunk” of the overall work. This depends on the problem you are actually solving, but the idea is that each function call will be a single “task” that a parallel process solves. Designing what this function does, what arguments it takes, and what it returns, is the most important part of making a good parallel program.
Inside an if __name__ == '__main__' block, start the “worker” processes with a with statement like
```
with ProcessPoolExecutor(5) as exe:
```
The 5 here is the number of “worker” processes that will be started to handle the tasks that come later.
Inside that with statement, write a loop to start each task and add them to a list, using calls to exe.submit().

Crucially, we pass the name of the function to exe.submit(), followed by the arguments. This is important because we are not trying to actually perform this function call directly, but just pass it along as a task to a worker process.
After all the tasks have been started, write another loop to call task.result() on each task, returning whatever that corresponding function call returned.

There will often be some post-processing necessary with the return values.

Here is a version of the primes program we have been using as a running example, this time using ProcessPoolExecutor and showing each step we need to think about:

from primes import is_prime
### STEP 0: Import ProcesssPoolExecutor
from concurrent.futures import ProcessPoolExecutor

### STEP 1: Write the function for each task
def next_prime(n):
    """Computes and prints out the next largest prime after n."""
    p = n
    while not is_prime(p):
        p += 1
    print(p)

if __name__ == '__main__':
    ### step 2: with statement to start the PPE workers
    with ProcessPoolExecutor(5) as exe:

        ### STEP 3: Loop to submit all the tasks
        tasks = []
        for x in range(1000000, 10000000, 1000000):
            t = exe.submit(next_prime, x)
            tasks.append(t)

        ### STEP 4: Separate loop to call result() on each task
        for task in tasks:
            task.result()

    print("All done")

4.3 Tasks vs workers

One of the cool advantages of the ProcessPoolExecutor is that we get to separate the idea of tasks from worker processes in our program.

The worker processes are started by the with statement, like
```
with ProcessPoolExecutor(8) as exe:
```
Usually, we want to tune this to correspond to the actual hardware resources available (CPU cores).
The tasks are started inside the with statement, bu calling submit()in some kind of loop, like
```
for x in range(1000000, 10000000, 1000000):
    exe.submit(next_prime, x)
```
The number of tasks might be exactly the same as the number of worker processes, or it might be a little bit larger. If there are a few more tasks than workers, then the ProcessPoolExecutor will assign some of the tasks initially, then assign the remaining tasks one by one as each worker finishes its initial task, and so on.

The point here is that we can separate the physical CPU constraints from the logic of the program. Does your problem naturally split into 20 sub-problems? Great - then you will submit 20 tasks. If the number of actual worker processes is only 8, it’s fine; those 8 workers will do 8 tasks at a time, eventually completing all 20 tasks.

4.4 Communication back to the main process

This is important. In the next_prime example above, each “task” function just needs to print something out, independently of all the other tasks. That means that each call to task.result() returns None, and the program becomes a bit simpler because there isn’t any “post-processing” that the parent process has to do.

Now let’s look at an example where that does happen. Here is a program that uses this API to retrieve 20 cat facts, and print out the longest one.

import requests
requests.packages.urllib3.disable_warnings()

def cat_fact():
    """Retrieves and returns a random cat fact from the API."""
    global facts
    resp = requests.get('https://catfact.ninja/fact', verify=False)
    return resp.json()['fact']

longest = None
for _ in range(20):
    fact = cat_fact()
    if longest is None or len(fact) > len(longest):
        longest = fact

print(longest)

Notice crucially that we don’t want to print out every fact. Each task should be retrieving a single fact, but then the main process needs to do the logic to find the longest one.

Here is that same program, now augmented to use a ProcessPoolExecutor:

import requests
from concurrent.futures import ProcessPoolExecutor
requests.packages.urllib3.disable_warnings()

def cat_fact():
    """Retrieves and returns a random cat fact from the API."""
    global facts
    resp = requests.get('https://catfact.ninja/fact', verify=False)
    return resp.json()['fact']

if __name__ == '__main__':
    longest = None
    with ProcessPoolExecutor() as exe:
        tasks = []
        for _ in range(20):
            tasks.append(exe.submit(cat_fact))
        for task in tasks:
            fact = task.result()
            if longest is None or len(fact) > len(longest):
                longest = fact

    print(longest)

Notice how the logic to “combine” the results from each task is shifted inside the with statement.

Also notice how (crucially!) there is a separate loop to start all the tasks with exe.submit(), before we begin the separate loop to call result() on each one.

Try downloading and running these yourself - you should see that the parallel version works many times faster by essentially performing multiple downloads at the same time.

SD 212 Spring 2026 / Notes