If the machine running the code has multiple cores, then multiprocessing (running code in multiple child processes) can help do tasks in parallel. Doing tasks in parallel helps in saving time on the overall execution.
Sequential execution
testFunction
is an example function that sleeps for certain time. In the following snippet, we are calling this function 2 times sequentially. Each function call takes 2 seconds to complete; thus, the execution time is around 4 seconds.
from time import time, sleep
def testFunction(sleepTime):
print(f'Sleeping ... Waking up in {sleepTime} second(s)')
sleep(sleepTime)
print(f'Woke up from sleep of {sleepTime} second(s).')
startTimestamp = time()
testFunction(2)
testFunction(2)
endTimestamp = time()
executionTimeInSeconds = endTimestamp - startTimestamp
print(f'Total execution time in seconds: {executionTimeInSeconds}')
Following are the logs from the run of above snippet. We can see that the execution time needed is around 4 seconds.
> python ./test_for_blog.py
Sleeping ... Waking up in 2 second(s)
Woke up from sleep of 2 second(s).
Sleeping ... Waking up in 2 second(s)
Woke up from sleep of 2 second(s).
Total execution time in seconds: 4.004408836364746
multiprocessing.Process class
Process
class of the multiprocessing
module helps in the creation of processes and running the code in those processes. The constructor of this class takes in the following keyword arguments.
from time import time, sleep
import multiprocessing as mp
def testFunction(sleepTime):
print(f'Sleeping ... Waking up in {sleepTime} second(s)')
sleep(sleepTime)
print(f'Woke up from sleep of {sleepTime} second(s).')
if __name__ == '__main__':
startTimestamp = time()
p1 = mp.Process(target=testFunction, args=(2,), name='Child process 1')
p2 = mp.Process(target=testFunction, kwargs={'sleepTime': 2}, name='Child process 2')
p1.start()
p2.start()
p1.join()
p2.join()
endTimestamp = time()
executionTimeInSeconds = endTimestamp - startTimestamp
print(f'Total execution time in seconds: {executionTimeInSeconds}')
In the above snippet, we create 2 processes and call our target function once in each of them. Let’s discuss the snippet in more detail.
if __name__ == '__name__'
condition is used to envelope all our multiprocessing code and it helps in giving time to the parent to bootstrap properly before starting the new processes. If we do not use this condition, then we get a runtime error.
start()
method calls the target from inside another process.
join()
method does execution blocking in the parent process till the time the child process is terminated. Note: If you are creating new processes in a loop, do not call the join method inside the same loop, as it will block the execution and the code will behave sequentially.
Following are the logs from the run of the above snippet.
> python ./test_for_blog.py
Sleeping ... Waking up in 2 second(s)
Sleeping ... Waking up in 2 second(s)
Woke up from sleep of 2 second(s).
Woke up from sleep of 2 second(s).
Total execution time in seconds: 2.070202112197876
Using execution in multiple processes, the total execution time comes around 2 seconds (not 4 seconds which was the case in sequential execution). Parallel processing saved TIME.
Python 3 NEWS for multiprocessing
From Python version 3.8, the spawn
method was made the default process start method in macOS, giving it preference over fork
method. The fork
method should be considered unsafe as it can lead to crashes in subprocesses.
Stay tuned
This was the first blog in the series of blogs on Python multiprocessing. Stay tuned for the next blog which will explore the concept of Process Pools.