Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Multicore Python: A tough, worthy, and reachable goal

Serdar Yegulalp | June 30, 2016
Python has been held back by its inability to natively use multiple CPU cores. Now Pythonistas are aiming to find a solution

For all of Python's great and convenient features, one goal remains out of reach: Python apps running on the CPython reference interpreter and using multiple CPU cores in parallel.

This has long been one of Python's biggest stumbling blocks, especially since all of the workarounds are clumsy. The urgency to find a long-term solution to the issue is growing, notably as core counts on processors continue to ramp up (see Intel's 24-core behemoth).

One lock for all 

In truth, it's possible to use threads in Python applications -- plenty of them already do. What's not possible is for CPython to run multithreaded applications with each thread executing in parallel on a different core. CPython's internal memory management isn't thread-safe, so the interpreter runs only one thread at a time, switching between them as needed and controlling access to the global state.

This locking mechanism, the Global Interpreter Lock (GIL), is the single biggest reason why CPython can't run threads in parallel. There are some mitigating factors; for instance, I/O operations like disk or network reads are not bound by the GIL, so those can run freely in their own threads. But anything both multithreaded and CPU-bound is a problem.

For Python programmers, this means heavy computational tasks that benefit from being spread out across multiple cores don't run well, barring the use of an external library. The convenience of working in Python comes at a major performance cost, which is becoming harder to swallow as faster, equally convenient languages like Google's Go come to the fore.

Pick the lock

Over time, a slew of options have emerged that ameliorate -- but do not eliminate -- the limits of the GIL. One standard tactic is to launch multiple instances of CPython and share context and state between them; each instance runs independently of the other in a separate process. But as Jeff Knupp explains, the gains provided by running in parallel can be lost by the effort needed to share state, so this technique is best suited to long-running operations that pool their results over time.

C extensions aren't bound by the GIL, so many libraries for Python that need speed (such as the math-and-stats library Numpy) can run across multiple cores. But the limitations in CPython itself remain. If the best way to avoid the GIL is to use C, that will drive more programmers away from Python and toward C.

PyPy, the Python version that compiles code via JIT, doesn't get rid of the GIL but makes up for it by simply having code run faster. In some ways this isn't a bad substitute: If speed is the main reason you've been eyeing multithreading, PyPy might be able to provide the velocity without the complications of multithreading.


1  2  3  Next Page 

Sign up for CIO Asia eNewsletters.