Scaling Across CPUs

Multithreading

Scaling across processors is usually done using multithreading. Multithreading is the ability to run code in parallel using threads. Threads are usually provided by the operating system and are contained in a single process. The operating system is responsible for scheduling their execution.

Since they run in parallel, that means they can be executed on separate processors even if they are contained in a single process. However, if only one CPU is available, the code is split up and run sequentially.

Therefore, when writing a multithreaded application, the code always runs concurrently but runs in parallel only if there is more than one CPU available.

This means that multithreading looks like a good way to scale and parallelize your application on one computer. When you want to spread the workload, you start a new thread for each new request instead of handling them one at a time.

Multi-threaded program

Drawbacks of multithreading

However, this does have several drawbacks in Python. If you have been in the Python world for a long time, you have probably encountered the word GIL, and know how hated it is. The GIL is the Python global interpreter lock, a lock that must be acquired each time CPython needs to execute byte-code. Unfortunately, this means that if you try to scale your application by making it run multiple threads, this global lock always limits the performance of your code, as there are many conflicting demands. All your threads try to grab it as soon as they need to execute Python instructions.

That code prints either [2, 1] or [1, 2] no matter what. While there is no way to know which thread appends 1 or 2 before the other, there is an assumption built into Python that each list.append operation is atomic. If it was not atomic, a memory corruption might arise and the list could simply contain [1] or [2].

This phenomenon happens because only one thread is allowed to execute a bytecode instruction at a time. That also means that if your threads run a lot of bytecodes, there are many contentions to acquire the GIL, and therefore your program cannot be faster than a single-threaded version. It could even be slower.

Thread-safe operations

The easiest way to know if an operation is thread-safe is to know if it translates to a single bytecode instructionA program code compiled from source code into low-level code designed for a software interpreter. or if it uses a basic type whose operations are atomic. The list is provided in the Python FAQ.

So, while using threads seems like an ideal solution at first glance, most applications I have seen running using multiple threads struggle to attain 150% CPU usage. That is to say, 1.5 cores are used. With computing nodes nowadays usually not having less than four or eight cores, it is a shame. Blame the GIL.

Removing GIL?

There is currently an effort underway (named gilectomy) to remove the GIL in CPython. Whether this effort will pay off is still unknown, but it is exciting to follow and see how far it will go.

However, CPython is just one, although the most common, of the available Python implementations. Jython, for example, doesn’t have a global interpreter lock, which means that it can run multiple threads in parallel efficiently. Unfortunately, these projects by their very nature lag behind CPython, and so they are not useful targets.

Global variables - an infinite source of human errors

Multithreading involves several traps, and one of them is that all the pieces of code running concurrently are sharing the same global environment and variables. Reading or writing global variables should be done exclusively by using techniques such as locking, which complicates your code. Moreover, it is an infinite source of human errors.

Getting multi-threaded applications right is hard. The level of complexity means that it is a large source of bugs. Considering the little to be gained in general, it is better not to waste too much effort on it.

Multiple processes

So, are we back to our initial use cases, with no real solutions on offer? Not true! There’s another solution you can use: using multiple processes. Doing this is going to be more efficient and easier as we will see in this lesson. It is also the first step before spreading across a network.

Scaling

CPU Scaling

Event Loops

Functional Programming

Queue-Based Distribution

Designing for Failure

Create a Web Crawler

Project Walkthrough

Lock Management

Group Membership

REST Interfaces

Deploying on PaaS

Testing Distributed Systems

Caching

Performance

Conclusion