Unit 9: Concurrency
1 Overview
So far all of the programs and shell scripts we have written are inherently serial: do one thing, then another thing, then another thing, etc. until the job is done.
But this is a wasteful use of the powerful computing resources at our disposal! As we learned in the previous unit, the operating system on a typical laptop or desktop is typically managing a few hundred processes running at the same time — why should just one of them be doing useful “work” on some data science task?
In this unit, we will get a small glimpse into the world of concurrent
programming, where we write code to execute more than one thing at the
same time. Starting with bash, we will first learn about how to manage
multiple processes can interact on the command line and the “familial”
relationships between them. Then we will see how to do the same thing
with Python’s multiprocessing
library. Finally, we will learn about
multi-threading and see that Python’s threading
library, while
superficially similar to multiprocessing, has a few key advantages and
disadvantages.
As with many things we cover in your intro classes, this is not an exhaustive view of the subject of parallel programming! The art of designing efficient concurrent programs is something you could take multiple whole courses on. Our goal here is to introduce the main concepts of multiprocessing and multithreading, and to see some of the advantages (and dangers) they can provide.
2 Resources
The Linux Command Line Chapter 10: Processes (Required)
This chapter of TLCL goes over how mutliple processes coexist and how to get information about them in Linux. You can gloss over the details in the section on Signals, but otherwise this is all good and useful info for us.
Pay special attention to job control: how to start, stop (pause) and kill a process, and how to execute processes in the background.
Python in a Nutshell Chapter 15: Concurrency
This is not one of our “usual” Python textbooks, but it’s still a good one for the level we are at.
Here’s what to focus on within this (long) chapter):
- The introduction which discusses multithreading vs multiprocessing. This also discusses the important concepts of Python’s Global Interpreter Lock (GIL).
- The section on the
threading
module for how to useThread
objects - The section on the
multiprocessing
module for how to useProcess
objects and the role of pickling.
The other sections of this chapter are all good info, but more detail than we have time to fully cover in SD212.
3 Bash
3.1 Processes
A process is a running program. When you write code in Python or another programming language, you’ve created a program. Actually executing/running that program creates a process.
We use a lot of human/family terminology when talking about processes:
Processes have a lifetime. They are created when they start running, then they are alive while running, and then they might either terminate normally or be killed by another command.
Each process has a parent, which is the process that started it. For example, when you open a terminal in VS Code, the
code
process is the parent of thebash
process in the terminal. If you run a command likegrep
in the terminal, then thatbash
process will be the parent ofgrep
.Similarly, if a running program starts another one, that new process is a child of the parent process.
This goes further to things like “orphans” and “widows”, but we won’t get into that for this course.
How can we refer to a process? We could use the command that the process is running, but that’s not enough, since (for example) there could be multiple instances of the same program running at once.
- Every process has a unique process identifier or PID. It is a number, usually 6 or 7 digits on modern versions of Linux.
- Every process also has a parent, which is the process that launched it. The parent process ID is called the PPID.
- The only exception is the process with PID 1, which is the first
process started by the OS kernel. On modern Linux, this process is
called
systemd
. It does not have a parent, and its PPID is listed as 0. If you follow the parents-of-parents of any Linux process, you will eventually get back to PID 1.
3.2 Backgrounding
By default, when you run a command in bash like
roche@ubuntu$
wget 'https://www.theinspiration.com/wp-content/uploads/016-3-1074x1074.webp'
the default behavior of bash is to run that program as a new process in the foreground. That means that the program runs in the terminal for as long as it takes, and bash waits patiently for it to finish. When the process is finished, bash steps in again and gives you a new prompt.
Instead, we can add the ampersand &
at the end of any command line or
after a line in a bash script to run that process in the background. A
background process runs just like a foreground process. The difference
is with bash itself: instead of waiting for the process to finish, bash
instead lets the process go on in the background and immediately gives
you a new prompt.
This is our first foray into multiprocessing, where we can get the
computer to do multiple things at the same time. When you execute a
program in the background in bash (with the &
), you can then
immediately start another process in the background, and another, etc.
The special command wait
will pause the current bash terminal or bash
script until all background commands are finished.
3.3 Signals (killing)
Normally, each of the many processes running at a given time are blissfully unaware of the hundreds of other processes that are also competing for limited hardware resources. The operating system partitions memory and schedules the CPU and I/O devices so that each process gets what they need without interference.
But of course sometimes we need to communicate between processes! One way to do this which we have already seen is a bash pipeline, where the output stream of one processes is hooked into the input stream of another one.
In Linux, signals are used to “interrupt” a process, usually triggered
by another process or by the OS. There are only a small number of
possible signals, with specific meanings, and each with a numeric code.
You can type man 7 signal
to learn about all of them.
For us, there are two signals we should know about:
SIGINT
(2): “Interrupt” a process. This is the signal that gets sent when you type Ctrl-C while a process is running.SIGTERM
(15): Nicely ask to terminate a process. This is the default signal sent with thekill
command.SIGKILL
(9): Forcibly terminate (“kill”) a process
The first signal SIGINT (Ctrl-C) can be ignored by a program if it wants to. For example, if you type Ctrl-C from the bash terminal without any program currently running, bash just ignores it and gives you another prompt.
The second one SIGKILL cannot be ignored. If you send a SIGKILL it will end that process immediately, even if it’s in the middle of doing something super-important.
The kill
command can be used to send a signal to a specific process.
Here is an example of running a wget
download command in the
background, then using ps
to see the PID of that wget command as it
runs, and then kill
to send a Ctrl-C signal to make wget quit early:
3.4 Useful commands
ps
: Get info about currently running processes in the current
terminal. Some variants/options:
ps -A
: Get info about all processes, not just the ones in
the current terminal
ps -f
: Display a full listing that includes the PPID
ps 123456
: Show info about the process with PID 123456
top
: Show a live update of all processes running, sorted by
default by CPU usage. (Similar to Windows Task Manager, but more
nerdy.)
htop
: Better, more colorful version of top
&
: Not a command, but can be added to the end of any command or
pipeline to make it run in the background
wait
: Pause execution until all background processes terminate
sleep
: Pause for a given number of seconds
kill
: Send a signal (default is SIGTERM) to a process with the
given PID
kill -2 123456
: Ahem hello process 123456, I’d like to get
your attention. Whoops, did I startle you and cause your death?
kill -15 123456
: Process 123456, please die when you get a
free moment.
kill -9 123456
: I’m not asking anymore. You’re dead.
4 Python
ps
: Get info about currently running processes in the current
terminal. Some variants/options:
ps -A
: Get info about all processes, not just the ones in the current terminalps -f
: Display a full listing that includes the PPIDps 123456
: Show info about the process with PID 123456
top
: Show a live update of all processes running, sorted by
default by CPU usage. (Similar to Windows Task Manager, but more
nerdy.)
htop
: Better, more colorful version of top
&
: Not a command, but can be added to the end of any command or
pipeline to make it run in the background
wait
: Pause execution until all background processes terminate
sleep
: Pause for a given number of seconds
kill
: Send a signal (default is SIGTERM) to a process with the
given PID
kill -2 123456
: Ahem hello process 123456, I’d like to get your attention. Whoops, did I startle you and cause your death?kill -15 123456
: Process 123456, please die when you get a free moment.kill -9 123456
: I’m not asking anymore. You’re dead.
In Python we can do something very similar to running a command in the background in bash, by executing a Python function in one of two ways: either in a separate Python process (multiprocessing) or in a new thread that is part of the same process (multithreading).
We will see how to do both, and what are the pros/cons of each approach.
4.1 Multiprocessing
Consider the following program, which finds and prints out the smallest prime numbers after 100,000, after 200,000, etc.:
def is_prime(p):
"""Returns whether or not p is a prime number."""
# (Note, this function is intentially slow, for demo purposes!)
d = 2
while d < p:
if p % d == 0:
# p is divisible by d, so not prime
return False
d += 1
# p has no divisors between 1 and p
return True
def next_prime(n):
"""Computes and prints out the next largest prime after n."""
p = n
while not is_prime(p):
p += 1
print(p)
if __name__ == '__main__':
for x in range(1000000, 10000000, 1000000):
next_prime(x)
Don’t worry too much about the details of primality checking. But notice
the general structure: we are calling a function next_prime
repeatedly, with different input arguments, and that function will
print out the results of each call.
This is an ideal candidate for multiprocessing! Here is what that same
program looks like when each of the function calls to next_prime
is
run in its own process:
from primes import next_prime
from multiprocessing import Process
if __name__ == '__main__':
children = []
for x in range(1000000, 10000000, 1000000):
child = Process(target=next_prime, args=[x])
child.start()
children.append(child)
for child in children:
child.join()
print("All done")
Notice a few components, which will be common for just about every time you want to use multiprocessing in Python:
Need to import the
Process
class from themultiprocessing
libraryAlways use a
if __name__ == '__main__'
to run multiprocessing stuff. (Otherwise, you risk each of the children spawning more and more children on some operating systems.)Use the
Process
constructor with two named arguments:target
is the name of the function you want to callargs
is a list or tuple of the arguments that will be passed to that function
Save your
Process
objects in a list (for later use)Call the
.start()
method on each Process. (If you forget this step, then they won’t actually run and your program will do nothing!)Later, in a separate loop, call the
.join()
method on each process. This causes the parent process to wait for each child to finish.(Make sure you understand why it needs to be a separate loop rather than calling
.join()
right after.start()
.)
You can run the program above on the command line with the time
built-in to test it for yourself. Here it is on my 4-core laptop. First
the original version without multi-processing:
roche@ubuntu$
$ time python3 primes.py
1000003
2000003
3000017
4000037
5000011
6000011
7000003
8000009
9000011
real 0m3.281s
user 0m3.269s
sys 0m0.012s
What we see here is first the actual output (the 9 prime numbers), in order as expected. Then the timing results say that this took about 3.3 seconds of “real” time, of which almost all of that was spent with the CPU working (“user” time measures normal CPU usage).
Now look carefully at what happens when we run the multi-processing version instead:
roche@ubuntu$
time python3 primes-mp.py
1000003
2000003
3000017
4000037
5000011
7000003
6000011
8000009
9000011
All done
real 0m1.151s
user 0m5.936s
sys 0m0.047s
There are some key differences here:
It got faster! The real time it took was cut almost in third, to about 1.2 seconds.
The output is not in order. In fact, if I run this again, the order will be slightly different each time. That is because each process is running simultaneously, so they can finish (and print their results) in different orders on different runs.
The CPU time went up. The “user” time measures the TOTAL CPU usage across all processes. Because there is some “overhead” cost for starting a new process, this can be higher than the single-process version. But because the CPU time is much larger than the “real” time, that’s a great indication that our program is taking good advantage of the 4-core CPU in my laptop.
4.2 Multithreading
There is an alternative to multiprocessing in Python called multi-threading. It follows a nearly-identical syntax to multiprocessing. Here is how we can write the same example as above using threads instead of processes:
from primes import next_prime
from threading import Thread
if __name__ == '__main__':
children = []
for x in range(1000000, 10000000, 1000000):
child = Thread(target=next_prime, args=[x])
child.start()
children.append(child)
for child in children:
child.join()
print("All done")
Notice there are only two differences compared with the multiprocessing version:
- import
Thread
from thethreading
library - Use the
Thread
class instead of theProcess
class
That’s it! The Python multiprocessing and threading library
authors intentionally designed the Process
and Thread
classes so
that they can be used identically in code, which makes it easy to try
each one.
But there are very important differences in how multithreading and multiprocessing work — don’t be fooled by the similar syntax! Here is a summary:
In multiprocessing, each Process runs separately with its own copies of all global variables. So if one process changes a global variable, it won’t be seen by the other processes.
In practice, this means that processes either have to print their results to the terminal, or write to files, or use some other mechanism (we will talk about pipes later) to communicate results back to the parent process.
Multithreading instead uses a shared memory, which means that all global variables are shared between all threads. This can make it much easier to send back data to the parent thread, just by appending to a global list or something like that.
Separate processes are truly independent and can take advantage of multi-core CPUs.
Separate threads are more inter-dependent, and something called the Global Interpreter Lock (GIL) in Python effectively prevents a multi-threaded program from using more than one CPU core at a time.
This is essentially a consequence of the shared memory model, but it obviously limits how much we can do with threading in Python.
What this means in practice is that multi-threading is most useful for I/O-based operations such as reading from disk or the Internet.
To see the unfortunate downside of the GIL in action, here is what happens when we run the multithreaded version:
roche@ubuntu$
time python3 primes-t.py
1000003
2000003
3000017
5000011
4000037
7000003
6000011
8000009
9000011
All done
real 0m3.408s
user 0m3.402s
sys 0m0.032s
Once again, the output shows up out of order (and might change on each run of the program), which indicates multiple things are running at once.
But the time is the same (actually a little worse) than the original version that didn’t do any multiprocessing or multithreading! That is because this program is highly CPU-dependent. In particular, it is doing a whole bunch of arithmetic to try and find prime numbers. So because the Python GIL effectively prevents more than one CPU core from working at once, we won’t get much timing benefit from multithreading in this case.
4.3 Sharing values between threads
Let’s look at an example where multi-threading is actually useful! Remember that because of the GIL, multi-threading in Python will not be helpful for CPU-intensive programs. But for I/O-intensive ones, we can see some nice benefits. In these kind of programs, the “slowness” comes not from the CPU working hard, but just from waiting for data to come back from the web or from disk.
Here’s a program that uses multi-threading to retrieve 20 random cat facts from this API, then sorts them in alphabetical order and prints them out:
import requests
from threading import Thread
requests.packages.urllib3.disable_warnings()
facts = []
def cat_fact():
global facts
resp = requests.get('https://cat-fact.herokuapp.com/facts/random', verify=False)
fact = resp.json()['text']
facts.append(fact)
if __name__ == '__main__':
children = []
for _ in range(20):
child = Thread(target=cat_fact, args=[])
child.start()
children.append(child)
for child in children:
child.join()
facts.sort()
for fact in facts:
print(fact)
Notice a few important things about this program:
Each function call appends its fact to the list
facts
. We have to specifyglobal facts
at the beginning of the function so that it knos to update the global variable.When creating each
Thread
, we leave theargs
field as an empty list because the function we are calling,cat_fact
, doesn’t take any arguments.Sorting the facts can only be done after all the responses are in. So inside the main part, we do
facts.sort()
only after thejoin()
loop completes.
Because this function is mostly not CPU-bound (doing a lot of computation) but rather IO-bound (making web requests), multi-threading is very effective, resulting in about a 3x speedup compared to doing the requests one at a time.
Multi-threading is also the right choice here because of the shared
memory used when each thread is attempting to append to a global list.
If we ran exactly the same program but replaced Thread
with Process
,
it would run without error but nothing would print at the end. That’s
because in multiprocessing there is no shared memory, so each process
would be appending to their own copy of the list, and the parent
thread at the end would be sorting and printing out an empty list.
4.4 Sharing values between processes using SimpleQueue
Let’s summarize what we know so far about multiprocessing vs multithreading:
Multiprocessing | Multithreading | |
---|---|---|
Shared memory | Each process has a copy of global variables | Each thread has shared read/write access to the same global variables |
GIL | Not affected by global interpreter lock | GIL prevents multiple threads from executing CPU instructions simultaneously |
CPU-bound tasks | Can effectively use multiple CPU cores | Not effective for CPU-intensive tasks because of the GIL |
IO-bound tasks | Works well | Works well |
So basically, we know that for tasks which need to have some shared memory we have to use threads, and for tasks which are computing-based (CPU bound) we need to use multiprocessing.
But what about when we have a CPU-bound task which really needs some communication between the processes besides print statements? In bash, we can achieve this kind of inter-process communication (IPC) using pipes.
It turns out, we can do something equivalent in Python using special multiprocessing data structures that can be shared between multiple processes.
There are a few mechanisms for this in Python, but the only one we will
look at is the
SimpleQueue
class in the multiprocessing library.
Think of a SimpleQueue
as a kind of “Python pipe” that can be used to
write and read Python values between processes. (Actually, if you look
at the documentation you will see that SimpleQueue uses a python Pipe
in its implementation, which is not a coincidence!)
Here is a summary of how to use a SimpleQueue
in a multiprocessing
Python program:
from multiprocessing import SimpleQueue
- Only once in the parent process, create a new queue object with
SimpleQueue()
- Pass this queue as an argument to the function calls that each Process runs
- Use the
.put(x)
method to add some value to the queue in one process - Use the
.get()
method to remove and return a value from the queue in a different process
Usually, the queue is only used for one-way communication: either the
child processes are put
ting thing in and the parent is get
ting them
out, or vice-versa.
Here is a complete example program which is based on finding prime
numbers like before, but with the difference that all of the child
processes add their individual counts to a shared SimpleQueue
, and the
parent process removes these counts and adds them up:
from primes import is_prime
from multiprocessing import Process, SimpleQueue
def count_primes(start, end, queue):
"""Returns the number of primes p with start <= p < end."""
print(f"Counting primes beween {start} and {end}...")
count = 0
for p in range(start, end):
if is_prime(p):
count += 1
print(f"There are {count} primes between {start} and {end}")
queue.put(count)
if __name__ == '__main__':
# This will count primes in the ranges (10000,11000) and
# then (11000,12000), etc. up to the range (90000,100000).
start = 10000
interval = 10000
nprocs = 9
# This object is only created ONCE but shared with all processes.
# It is a very special data structure from the multiprocessing
# library that can be used to communicate between processes.
queue = SimpleQueue()
# current_end is updated with how far we have gone in
# starting the processes for each range.
current_end = start
# Start up each process.
# Notice that we will use the queue to know when all the processes
# are finished, so we don't have to save the Process objects into
# a list.
for _ in range(nprocs):
p = Process(target=count_primes, args=[current_end, current_end+interval, queue])
p.start()
current_end += interval
# Now wait for each process to finish and add up the results.
total = 0
for _ in range(nprocs):
partial = queue.get()
total += partial
print(f"Adding {partial}; running sum is {total}")
# Now everything is done, so print the grand total.
print(f"The TOTAL number of primes between {start} and {current_end} is {total}")
Try running it! You should notice that as each process reports its count
and adds it to the queue, the parent process immediately adds that to a
running total. Crucially, because we know each child process will only
add a single value to the queue and exit, the parent process can just
have a single loop at the end to get()
the correct number of values
from the queue corresponding to how many processes it launched at the
beginning.
On my 6-core desktop, this cuts the running time down from 33.5 seconds in a single-process version, down to 8.1 seconds using multiprocessing. Great!
Remember: it would be easier to just use a shared global variable for the counter instead of the SimpleQueue, but shared global variables don’t work with multiprocessing. We could instead implement this with multithreading, which works and gets the right answer, but doesn’t run any faster than a simple non-parallel version because this is a CPU-intensive computation that is hampered by the GIL.
(Try it! If you change Process
to Thread
in the code above, it
actually takes about twice as long as not using threads at all.)
In summary, we can often avoid inter-process communiction using these special data structures like SimpleQueue either by:
- Printing to the screen or writing to files in each process rather than sending Python variables back to the parent process; or
- Using multithreading and global variables instead of multiprocessing, if the computation is IO-bound
But when we have a CPU-intensive process which requires variable values to be sent back to the parent process (numbers in the prime counting example), then we can still do it by using multiprocessing and the SimpleQueue (or some other mechanisms we won’t cover in SD212).