Unit 7: Concurrency
1 Overview
So far all of the programs and shell scripts we have written are inherently serial: do one thing, then another thing, then another thing, etc. until the job is done.
But this is a wasteful use of the powerful computing resources at our disposal! Every modern computer — even your cell phone — has at least a handful of independently-executing CPU cores. Why should just one of them be doing useful “work” on some data science task?
In this unit, we will get a small glimpse into the world of concurrent
programming, where we write code to execute more than one thing at the
same time. Starting with bash, we will first learn about how to manage
multiple processes can interact on the command line and the “familial”
relationships between them. Then we will see how to do the same thing
with Python’s multiprocessing library, specifically
concurrent.futures.ProcessPoolExecutor.
As with many things we cover in your intro classes, this is not an exhaustive view of the subject of parallel programming! The art of designing efficient concurrent programs is something you could take multiple whole courses on. Our goal here is to introduce the main concepts of multiprocessing, and to see some of the advantages (and dangers) it can provide.
2 Resources
The Linux Command Line Chapter 10: Processes (Required)
This chapter of TLCL goes over how mutliple processes coexist and how to get information about them in Linux. You can gloss over the details in the section on Signals, but otherwise this is all good and useful info for us.
Pay special attention to job control: how to start, stop (pause) and kill a process, and how to execute processes in the background.
Python in a Nutshell Chapter 15: Concurrency
This is not one of our “usual” Python textbooks, but it’s still a good one for the level we are at.
3 Bash
3.1 Processes
A process is a running program. When you write code in Python or another programming language, you’ve created a program. Actually executing/running that program creates a process.
We use a lot of human/family terminology when talking about processes:
Processes have a lifetime. They are created when they start running, then they are alive while running, and then they might either terminate normally or be killed by another command.
Each process has a parent, which is the process that started it. For example, when you open a terminal in VS Code, the
codeprocess is the parent of thebashprocess in the terminal. If you run a command likegrepin the terminal, then thatbashprocess will be the parent ofgrep.Similarly, if a running program starts another one, that new process is a child of the parent process.
This goes further to things like “orphans” and “widows”, but we won’t get into that for this course.
How can we refer to a process? We could use the command that the process is running, but that’s not enough, since (for example) there could be multiple instances of the same program running at once.
- Every process has a unique process identifier or PID. It is a number, usually 6 or 7 digits on modern versions of Linux.
- Every process also has a parent, which is the process that launched it. The parent process ID is called the PPID.
- The only exception is the process with PID 1, which is the first
process started by the OS kernel. On modern Linux, this process is
called
systemd. It does not have a parent, and its PPID is listed as 0. If you follow the parents-of-parents of any Linux process, you will eventually get back to PID 1.
3.2 Backgrounding
By default, when you run a command in bash like
roche@ubuntu$wget 'https://www.theinspiration.com/wp-content/uploads/016-3-1074x1074.webp'
the default behavior of bash is to run that program as a new process in the foreground. That means that the program runs in the terminal for as long as it takes, and bash waits patiently for it to finish. When the process is finished, bash steps in again and gives you a new prompt.
Instead, we can add the ampersand & at the end of any command line or
after a line in a bash script to run that process in the background. A
background process runs just like a foreground process. The difference
is with bash itself: instead of waiting for the process to finish, bash
instead lets the process go on in the background and immediately gives
you a new prompt.
This is our first foray into multiprocessing, where we can get the
computer to do multiple things at the same time. When you execute a
program in the background in bash (with the &), you can then
immediately start another process in the background, and another, etc.
The special command wait will pause the current bash terminal or bash
script until all background commands are finished.
3.3 Signals (killing)
Normally, each of the many processes running at a given time are blissfully unaware of the hundreds of other processes that are also competing for limited hardware resources. The operating system partitions memory and schedules the CPU and I/O devices so that each process gets what they need without interference.
But of course sometimes we need to communicate between processes! One way to do this which we have already seen is a bash pipeline, where the output stream of one processes is hooked into the input stream of another one.
In Linux, signals are used to “interrupt” a process, usually triggered
by another process or by the OS. There are only a small number of
possible signals, with specific meanings, and each with a numeric code.
You can type man 7 signal to learn about all of them.
For us, there are two signals we should know about:
SIGINT(2): “Interrupt” a process. This is the signal that gets sent when you type Ctrl-C while a process is running.SIGTERM(15): Nicely ask to terminate a process. This is the default signal sent with thekillcommand.SIGKILL(9): Forcibly terminate (“kill”) a process
The first signal SIGINT (Ctrl-C) can be ignored by a program if it wants to. For example, if you type Ctrl-C from the bash terminal without any program currently running, bash just ignores it and gives you another prompt.
The second one SIGKILL cannot be ignored. If you send a SIGKILL it will end that process immediately, even if it’s in the middle of doing something super-important.
The kill command can be used to send a signal to a specific process.
Here is an example of running a wget download command in the
background, then using ps to see the PID of that wget command as it
runs, and then kill to send a Ctrl-C signal to make wget quit early:

3.4 Useful commands
ps: Get info about currently running processes in the current
terminal. Some variants/options:
ps -A: Get info about all processes, not just the ones in
the current terminal
ps -f: Display a full listing that includes the PPID
ps 123456: Show info about the process with PID 123456
top: Show a live update of all processes running, sorted by
default by CPU usage. (Similar to Windows Task Manager, but more
nerdy.)
htop: Better, more colorful version of top
&: Not a command, but can be added to the end of any command or
pipeline to make it run in the background
wait: Pause execution until all background processes terminate
sleep: Pause for a given number of seconds
kill: Send a signal (default is SIGTERM) to a process with the
given PID
kill -2 123456: Ahem hello process 123456, I’d like to get
your attention. Whoops, did I startle you and cause your death?
kill -15 123456: Process 123456, please die when you get a
free moment.
kill -9 123456: I’m not asking anymore. You’re dead.
4 Python
ps: Get info about currently running processes in the current
terminal. Some variants/options:
ps -A: Get info about all processes, not just the ones in the current terminalps -f: Display a full listing that includes the PPIDps 123456: Show info about the process with PID 123456
top: Show a live update of all processes running, sorted by
default by CPU usage. (Similar to Windows Task Manager, but more
nerdy.)
htop: Better, more colorful version of top
&: Not a command, but can be added to the end of any command or
pipeline to make it run in the background
wait: Pause execution until all background processes terminate
sleep: Pause for a given number of seconds
kill: Send a signal (default is SIGTERM) to a process with the
given PID
kill -2 123456: Ahem hello process 123456, I’d like to get your attention. Whoops, did I startle you and cause your death?kill -15 123456: Process 123456, please die when you get a free moment.kill -9 123456: I’m not asking anymore. You’re dead.
In Python we can do something very similar to running a command in the background in bash, by executing a Python function in a separate Python process (multiprocessing).
4.1 Multiprocessing
Consider the following program, which finds and prints out the smallest prime numbers after 100,000, after 200,000, etc.:
def is_prime(p):
"""Returns whether or not p is a prime number."""
# (Note, this function is intentially slow, for demo purposes!)
d = 2
while d < p:
if p % d == 0:
# p is divisible by d, so not prime
return False
d += 1
# p has no divisors between 1 and p
return True
def next_prime(n):
"""Computes and prints out the next largest prime after n."""
p = n
while not is_prime(p):
p += 1
print(p)
if __name__ == '__main__':
for x in range(1000000, 10000000, 1000000):
next_prime(x)Don’t worry too much about the details of primality checking. But notice
the general structure: we are calling a function next_prime
repeatedly, with different input arguments, and that function will
print out the results of each call.
This is an ideal candidate for multiprocessing! Here is what that same
program looks like when each of the function calls to next_prime is
run in its own process:
from primes import next_prime
from multiprocessing import Process
if __name__ == '__main__':
children = []
for x in range(1000000, 10000000, 1000000):
child = Process(target=next_prime, args=[x])
child.start()
children.append(child)
for child in children:
child.join()
print("All done")Notice a few components, which will be common for just about every time you want to use multiprocessing in Python:
Need to import the
Processclass from themultiprocessinglibraryAlways use a
if __name__ == '__main__'to run multiprocessing stuff. (Otherwise, you risk each of the children spawning more and more children on some operating systems.)Use the
Processconstructor with two named arguments:targetis the name of the function you want to callargsis a list or tuple of the arguments that will be passed to that function
Save your
Processobjects in a list (for later use)Call the
.start()method on each Process. (If you forget this step, then they won’t actually run and your program will do nothing!)Later, in a separate loop, call the
.join()method on each process. This causes the parent process to wait for each child to finish.(Make sure you understand why it needs to be a separate loop rather than calling
.join()right after.start().)
You can run the program above on the command line with the time
built-in to test it for yourself. Here it is on my 4-core laptop. First
the original version without multi-processing:
roche@ubuntu$$ time python3 primes.py100000320000033000017400003750000116000011700000380000099000011real 0m3.281suser 0m3.269ssys 0m0.012s
What we see here is first the actual output (the 9 prime numbers), in order as expected. Then the timing results say that this took about 3.3 seconds of “real” time, of which almost all of that was spent with the CPU working (“user” time measures normal CPU usage).
Now look carefully at what happens when we run the multi-processing version instead:
roche@ubuntu$time python3 primes-mp.py100000320000033000017400003750000117000003600001180000099000011All donereal 0m1.151suser 0m5.936ssys 0m0.047s
There are some key differences here:
It got faster! The real time it took was cut almost in third, to about 1.2 seconds.
The output is not in order. In fact, if I run this again, the order will be slightly different each time. That is because each process is running simultaneously, so they can finish (and print their results) in different orders on different runs.
The CPU time went up. The “user” time measures the TOTAL CPU usage across all processes. Because there is some “overhead” cost for starting a new process, this can be higher than the single-process version. But because the CPU time is much larger than the “real” time, that’s a great indication that our program is taking good advantage of the 4-core CPU in my laptop.
4.2 ProcessPoolExecutor
There is a simpler abstraction over the low-level Process objects in
Python, the ProcessPoolExecutor from the concurrent.futures
library that is part of standard Python.
This is currently the best way to write multiprocessing code in Python and this is how we recommend you write parallel code in Python for SD212.
There are four main steps to designing a parallel program in Python using ProcessPoolExecutor:
Write a function to do some “chunk” of the overall work. This depends on the problem you are actually solving, but the idea is that each function call will be a single “task” that a parallel process solves. Designing what this function does, what arguments it takes, and what it returns, is the most important part of making a good parallel program.
Inside an
if __name__ == '__main__'block, start the “worker” processes with awithstatement likewith ProcessPoolExecutor(5) as exe:The
5here is the number of “worker” processes that will be started to handle the tasks that come later.Inside that
withstatement, write a loop to start each task and add them to a list, using calls toexe.submit().Crucially, we pass the name of the function to
exe.submit(), followed by the arguments. This is important because we are not trying to actually perform this function call directly, but just pass it along as a task to a worker process.After all the tasks have been started, write another loop to call
task.result()on each task, returning whatever that corresponding function call returned.There will often be some post-processing necessary with the return values.
Here is a version of the primes program we have been using as a running example, this time using ProcessPoolExecutor and showing each step we need to think about:
from primes import is_prime
### STEP 0: Import ProcesssPoolExecutor
from concurrent.futures import ProcessPoolExecutor
### STEP 1: Write the function for each task
def next_prime(n):
"""Computes and prints out the next largest prime after n."""
p = n
while not is_prime(p):
p += 1
print(p)
if __name__ == '__main__':
### step 2: with statement to start the PPE workers
with ProcessPoolExecutor(5) as exe:
### STEP 3: Loop to submit all the tasks
tasks = []
for x in range(1000000, 10000000, 1000000):
t = exe.submit(next_prime, x)
tasks.append(t)
### STEP 4: Separate loop to call result() on each task
for task in tasks:
task.result()
print("All done")4.3 Tasks vs workers
One of the cool advantages of the ProcessPoolExecutor is that we get to separate the idea of tasks from worker processes in our program.
The worker processes are started by the with statement, like
with ProcessPoolExecutor(8) as exe:Usually, we want to tune this to correspond to the actual hardware resources available (CPU cores).
The tasks are started inside the with statement, bu calling
submit()in some kind of loop, likefor x in range(1000000, 10000000, 1000000): exe.submit(next_prime, x)The number of tasks might be exactly the same as the number of worker processes, or it might be a little bit larger. If there are a few more tasks than workers, then the ProcessPoolExecutor will assign some of the tasks initially, then assign the remaining tasks one by one as each worker finishes its initial task, and so on.
The point here is that we can separate the physical CPU constraints from the logic of the program. Does your problem naturally split into 20 sub-problems? Great - then you will submit 20 tasks. If the number of actual worker processes is only 8, it’s fine; those 8 workers will do 8 tasks at a time, eventually completing all 20 tasks.
4.4 Communication back to the main process
This is important. In the next_prime example above, each “task”
function just needs to print something out, independently of all the
other tasks. That means that each call to task.result() returns
None, and the program becomes a bit simpler because there isn’t any
“post-processing” that the parent process has to do.
Now let’s look at an example where that does happen. Here is a program that uses this API to retrieve 20 cat facts, and print out the longest one.
import requests
requests.packages.urllib3.disable_warnings()
def cat_fact():
"""Retrieves and returns a random cat fact from the API."""
global facts
resp = requests.get('https://catfact.ninja/fact', verify=False)
return resp.json()['fact']
longest = None
for _ in range(20):
fact = cat_fact()
if longest is None or len(fact) > len(longest):
longest = fact
print(longest)Notice crucially that we don’t want to print out every fact. Each task should be retrieving a single fact, but then the main process needs to do the logic to find the longest one.
Here is that same program, now augmented to use a ProcessPoolExecutor:
import requests
from concurrent.futures import ProcessPoolExecutor
requests.packages.urllib3.disable_warnings()
def cat_fact():
"""Retrieves and returns a random cat fact from the API."""
global facts
resp = requests.get('https://catfact.ninja/fact', verify=False)
return resp.json()['fact']
if __name__ == '__main__':
longest = None
with ProcessPoolExecutor() as exe:
tasks = []
for _ in range(20):
tasks.append(exe.submit(cat_fact))
for task in tasks:
fact = task.result()
if longest is None or len(fact) > len(longest):
longest = fact
print(longest)Notice how the logic to “combine” the results from each task is shifted
inside the with statement.
Also notice how (crucially!) there is a separate loop to start all the
tasks with exe.submit(), before we begin the separate loop to call
result() on each one.
Try downloading and running these yourself - you should see that the parallel version works many times faster by essentially performing multiple downloads at the same time.