Parallelism means doing multiple things at once. Writing your programs in a parallel way can make them faster in two scenarios:
There are many ways to write parallel code. We'll use Python's multiprocessing pools, which start multiple Python processes, then split work across them.
First, we'll create a function that is artificially slow.
from time import sleep, time
def add(xy):
sleep(0.1) # you could imagine this is some complicated, slow calculation
return xy[0] + xy[1]
t0 = time()
print("result:", add((2,3)))
t1 = time()
print(t1 - t0, "seconds")
If we add 10 numbers sequentially (meaning one-at-a-time -- the opposite of in parallely), it will take 10 times that long (~1 second total). A loop is one way to perform this task sequentially.
xy_pairs = [(10,1), (10,2), (10,3), (10,4), (10,5), (3,3), (2,2), (1,1), (1,3), (4,3)]
t0 = time()
for xy in xy_pairs:
print("result:", add(xy))
t1 = time()
print(t1 - t0, "seconds")
Python's map
function calls a function for us on each input, yielding each result, but it still works sequentially.
t0 = time()
for result in map(add, xy_pairs):
print("result:", result)
t1 = time()
print(t1 - t0, "seconds")
Remember that map
works with function references, something like this:
def mymap(func_ref, inputs):
for v in inputs:
result = func_ref(v)
yield result
Python's multiprocessing
module has a Pool
class. A pool is a collection of processes. We can create a pool with a specified number of processes (say 5), then use the pool's version of map
, which IS parallel (each process does a subset of the work).
from multiprocessing import Pool
with Pool(5) as p:
t0 = time()
for result in p.map(add, xy_pairs):
print("result:", result)
t1 = time()
print(t1 - t0, "seconds")
It looks like using 5 processes made it 5 times faster! Let's try 10 processes.
with Pool(10) as p:
t0 = time()
for result in p.map(add, xy_pairs):
print("result:", result)
t1 = time()
print(t1 - t0, "seconds")
We're getting good speedups as we add more processes as most of the "work" is sleeping, and we can do plenty of that with limited cores. If we need to do actual computation (for example, looping 3 million times),
def add(xy):
for i in range(3000000): # 3 million
pass
return xy[0] + xy[1]
with Pool(1) as p:
t0 = time()
for result in p.map(add, xy_pairs):
print("result:", result)
t1 = time()
print(t1 - t0, "seconds (1 process)")
with Pool(10) as p:
t0 = time()
for result in p.map(add, xy_pairs):
print("result:", result)
t1 = time()
print(t1 - t0, "seconds (10 process)")
The difference between the speed with 1 process vs. 10 processes will depend on the number of cores. If there is only 1 core, the times are probably the same. If there are 10 or more cores, the second measurement will probably be ten times faster. In most cases, it will be somewhere between.
Based on the two times above, can you estimate how many core's your instructor's laptop has?
Using multiple processes for parallelism is tricky because each process has its own variables/state. It will take care of passing in inputs and returning outputs, but avoid modifying global varialbles. They'll get modified in the pool processes, but not in the process used for your Jupyter notebook (which is what people making this mistake are usually trying to do).
total = 0
def increment(amt):
global total
total += amt
print("SUB TOTAL SO FAR", total)
with Pool(2) as p:
p.map(increment, [3,2,9,1])
There are two processes, each with it's own total
global variable, so even though they do some counting, they don't see the total of 15. Worse, the process that made the map
call (this Jupyter notebook) has yet another total
global variable, which didn't get modified at all.
total