What exactly is GIL?
Global Interpreter Lock is a flag that exists in the Python interpreter, and causes only one bytecode sequence in the Python VM to be executed at a time.
The initial reason for its creation is that the internal memory management of the Python interpreter is not thread-safe. That is: the C code that stores memory for creating objects, even the simple ones like integers or tuples, uses GIL as Mutex to ensure that it will not be interrupted by the operating system, in a thread change, and leaving the internal data structures that Python uses for memory in an inconsistent state.
In addition, GIL is responsible for Python’s data structures such as lists, dictionaries and, most importantly, internal structures such as execution frames, dictionaries that store variables, etc... function transparently in code using multiple threads.
Basically, the central Python VM code, the "Dispatch" structure which is a large "switch case" for the bytecode only "walks" if it has GIL control.
I think the most relevant source in English about GIL is in the Python wiki: https://wiki.python.org/moin/GlobalInterpreterLock (much of the information that’s down there is also)
What are its practical implications on
an application that uses threads? Reading on the surface, I inferred that
due to Cpython implementations, only one thread is parsed
by the interpreter at a time.
Does this imply that threads do not run in parallel? A thread
should release the interpreter for the others to be executed? If
yes, when this occurs?
Yes, as long as the thread is running Python code. This prevents another thread from being activated in the middle of an expression, and alters values of variables in transactions that have to be atomic.
However at all times a thread will make a blocking action - that is, something that does not depend on running more bytecode, but will wait for reading data from a file, or from a socket, or will simply call a function in native code (usually in C)which has no danger of conflicting with the internal state of the Python interpreter, GIL is "released": that is, the running thread releases GIL, and other threads can execute its code (even if a CPU core is at 100% processing native code).
Therefore, programs that use threads to meet various internet requests, or process files from a slow disk, or do heavy numerical calculations using Numpy, for example, do not suffer as much impact from GIL. However, programs that have complex algorithms in pure Python, for example, several operations in small strings, have a big impact - so it happens to "just run one thread at a time".
In particular when using Numpy for numerical processing, it naturally makes use of multiple CPU cores, without the developer having to worry about threads or any other feature in the Python part of the code.
Given that it is a characteristic of Cpython,
GIL is exclusive to Cpython or other interpreters as well
possess such a characteristic?
In general other Python implementations such as Pypy will have a GIL as well. Jython and Ironpython use the engine of their respective Vms to be able to have consistent code in threads, and do not have GIL.
Continuing with the parts not asked:
And how do you do then?
Multiprocessing
https://docs.python.org/3/library/multiprocessing.html
The simplest way to have multithreaded code effectively running in parallel, with no worries with GIL is multiprocessing: it’s an API designed to be threadable; however, it creates a new process for what would be a thread. The negative points are: creating a new process in terms of system resource is something much more "expensive" than creating a thread, and, the passage of arguments between the subprocesses is made by Serializing all data with pickle. This causes an overhead, and prevents non-serializable objects (such as open files, sockets, etc...) from being passed as parameters in an ordinary way.
Anyway, see also the concurrent.futures
- is a library that allows the use of threads pools and processes in a simplified way, with several control structures that facilitate the creation of few processes or threads and reuse them efficiently to perform several times the tasks. https://docs.python.org/3/library/concurrent.futures.html
asy
https://docs.python.org/3/library/asyncio.html
Starting in Python 3.4, asyncio is the new wave of efficient programming in Python. Basically, support was added in the language, initially only with the library, and in Python 3.5 with specific syntax so that the distribution of various tasks that were typically performed in a server application on different threads all stay in the same thread, and the use of co-routines allows context exchange between different tasks.
Asyncio doesn’t exactly "dribble" GIL - it’s just a form of concurrent programming that explicitly uses a single thread. Typically at the same points as GIL would be released in the code, the thread runs another asynchronous task. For cases where the final tasks do not have asynchronous implementations, it is necessary to explicitly execute them on a worker that may be on another thread, or another process, much like what is done with the concurrent.futures
.
Cython
http:/cython.org/
An interesting way to write your "intense" algorithm in pure Python and not be subjected to GIL is to use Cython. Cython is a super-set Python language that translates Python syntax directly into C code using the Python API, but with the option of static typing for variables that need to be accessed faster. If you are sure that a chunk with enough CPU usage will not interfere with variables outside the current thread, Cython has calls to explicitly release GIL.
If you are going to program your own Python extensions in native code, whether in C or a language like Go, Rust, C++, etc... you are responsible, in the same way as programming in Cython, for releasing GIL. More information here: https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock
Celery
http://www.celeryproject.org/
If you have to create backend code to meet multiple requests in parallel, with intensive CPU usage, this is the way to go:
Celery is a set of tools that allows the distribution of tasks using a outsourced "Broker". Firing a task to be executed asynchronously with Celery is as simple as calling a function, but it has some advantages over the other methods: Your Celery Workers need not even be on the same machine - it is only necessary that all Workers can communicate with the Broker process (which can be a bank like redis, rabbitmq, or Amazon Queues (SQS)). In addition, Celery has built-in a simple task re-execution system if they are not completed in a timely manner.
Celery is very simple to use on a single machine, and it’s easy to set up on a set of cloud machines that have horizontal scalability with virtually zero concern.
PEP 554 - Multiple interpreters
https://www.python.org/dev/peps/pep-0554/
When the PEP-554 is implemented (at some point in Python 3.9 forward), it will be possible to have pure Python code that has multiple instances of the Cpython interpreter in the same process: this will allow a GIL for each interpreter. (However, code using multiple interpreters in this way is something quite advanced and will still be a little experimental - I only mentioned here why we are about to overcome the 'barrier of a GIL' with this)
Other
In addition to the above, there are other ways to bypass GIL - in general other remote code execution libraries: Celery is not the only one, it is only the most used. There is a prototype of Pypy trying to use a "transactional memory approach" and remove GIL, and there are developers from core Python working on a Fork that tries to do a "Gilectomy": eliminate GIL completely. But this effort will take a few years yet. https://github.com/larryhastings/gilectomy