How does Python manage memory when assigning different types?

Question

How does Python manage memory when assigning different types?

Asked 4 years, 1 month ago

Viewed 742 times

5

I wanted to understand how dynamic typing is done.

In Python, for example, when we create a variable with content a number int and then that same variable gets a string, the fact of not giving problem with memory allocation leaves me confused. What happens in Python? All types are treated as specification of the main object, there works through upcasting and downcasting?

2

If you understand English, the page of Memory Management documentation explains how memory management works in python 3. Unfortunately there is no version of this page in English.

– G. Bittencourt

2020/05/27 at 17:22

1 answer

Browser other questions tagged python typing memory python-internals

You are not signed in. Login or sign up in order to post.

by Maniero • **444,682** points · Answer 1 · 2020-05-27T18:20:18+00:00

Dynamic typing languages usually have only 1 type that is the object. This type has the infrastructure to manage other types of data.

I imagine you understand how data is stored in memory, you know you need the type to know the size of the allocation to store. As Python and other dynamic typing languages have only 1 type of data it is easy to do this, all basic objects are allocated the same way with storage of same size.

When I’m talking about allocation it’s about space, not about managing that allocation because it’s an orthogonal concept, and it has nothing to do with dynamic typing, memory management can occur in a similar way in static typing languages. I say this because a comment talks about the subject and is not what the question wants to know, she used the wrong tewrmo for not understanding what it is exactly about.

This object is typically a structure that needs two basic information (at the end I show specifically Python as it is). You need information about what kind of value you have in it and the value itself.

This type that is marked is what determines the typing you know in Python, so the typing is done in another layer. This can be 1 byte and each number indicate a different type. It does not need to be so, but it is a simple and easy way to do.

The other information is the value. This value can have a single size, for example it can be 8 bytes.

Then this set will always have 9 bytes. Ready, you know you will always allocate this size.

But what has that value?

If the type is a boolean it will only use 1 of these 8 bytes and will have the value 0 or 1. If you have a number you will have a floating point value form or an integer as optimization (each language can do this or not) and if you have other basic numeric types it will do so and will occupy from 1 to 8 bytes that has already been stored for you.

It starts to change a little when you have one string. This type is by reference and not by value, so the value there in those 8 bytes would have a pointer to the text. Reference types always exist in two parts, the pointer and the pointed object where the actual object has the die. Then this extra allocation can have different size and is determined at the time of execution, so the most common is to be allocated in the heap. This works the same in any other language that has no dynamic typing.

The same goes for gold reference types as the list and the dictionary, only the pointer to the object goes there, and in 8 bytes fits any pointer.

If you understand from C this object would be more or less that:

struct {
    char type; //obviamente indica qual é o tipo
    union { //só usa um dos valores abaixo, e sempre armazena o espaço para o maior deles
        char boolean; //só se for o booleano
        double number; //só se for um número
        void *reference; //só se for um objeto por referência
    } value;
} object;

Just to illustrate, it doesn’t have to be exactly like this.

This is a way to create a tagged Union since C does not have this mechanism, in other languages typing could be more automatic.

So in every code that’s the object that will circulate. And every language knows that before accessing the value the code has to analyze what type is waiting in that context and the type of object that will manipulate to see if it can perform that operation or should give error if they are incompatible. To know the type of that object is just to look at that field type that I showed up there, depending on the value that’s on it you know what kind it is (for example 0 would be a None, 1 would be a boolean, 2, number, 3 string, 4, list, 5, dictionary, 6 object, etc.).

The language ensures that every time you put a value there it will synchronize the type in the other field to always be compatible. If your code tells you to make a change to the object the value will change and the type together.

All this is done dynamically. This has nothing to do with the language being interpreted or compiled, it can be done in both, and in fact today it is common for Python to run in compiled form.

Conversion

When the type is not what you want you have to use some function that generates another object for you. In some cases you may want to change the object itself (then it becomes the new type "forever"). In some operations it is possible to make only a reinterpretation, for example it can cause a boolean to be interpreted as a normal number without generating another number, after all a boolean is represented as a number so he knows what to do.

Note that Python requires you to do this explicitly, after all it has strong typing, a concept that people often confuse.

Like curiosity when you have a type object in Python, it internally loads another type that it is, so we have 3 layers of type: that unique type that Python uses to represent everything, the type object and the type of what is more specifically this object (the class name). The first is not accessible or important to anyone who only uses the language.

The up and downcasting is used more upon the types that are object. There is a casting of other types, but it is less common and in some cases actually requires conversion, not just a reinterpretation. *real casting does not convert (sometimes we talk about casting when it converts, but it’s just for simplicity). This is not very different from other languages, it’s just a matter of understanding how the data is actually stored.

I don’t want to go into so much detail, but you can research more on the subject here on the site, I replied on How Python handles static and dynamic variables?.

In How Python handles and represents an array internally? has more information specifically on arrays, even has snippets of Python code showing how it actually stores objects.

More details on Typed and untyped programming languages.

Python structures

The actual structure that Python uses should be this (if it hasn’t changed when you’re reading here a while later, since this is internal detail of the language and they can change when and how they want). It is not something so simple, and actually has several parts, example:

typedef struct _object {
    _PyObject_HEAD_EXTRA
    Py_ssize_t ob_refcnt;
    PyTypeObject *ob_type;
} PyObject;

Note that there is a field for reference counting which is the preferred form of Python garbage collection mechanism.

Also:

typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;

Other:

typedef struct _typeobject {
    PyObject_VAR_HEAD
    const char *tp_name; /* For printing, in format "<module>.<name>" */
    Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */

    /* Methods to implement standard operations */

    destructor tp_dealloc;
    Py_ssize_t tp_vectorcall_offset;
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    PyAsyncMethods *tp_as_async; /* formerly known as tp_compare (Python 2)
                                    or tp_reserved (Python 3) */
    reprfunc tp_repr;

    /* Method suites for standard classes */

    PyNumberMethods *tp_as_number;
    PySequenceMethods *tp_as_sequence;
    PyMappingMethods *tp_as_mapping;

    /* More standard operations (here for binary compatibility) */

    hashfunc tp_hash;
    ternaryfunc tp_call;
    reprfunc tp_str;
    getattrofunc tp_getattro;
    setattrofunc tp_setattro;

    /* Functions to access object as input/output buffer */
    PyBufferProcs *tp_as_buffer;

    /* Flags to define presence of optional/expanded features */
    unsigned long tp_flags;

    const char *tp_doc; /* Documentation string */

    /* call function for all accessible objects */
    traverseproc tp_traverse;

    /* delete references to contained objects */
    inquiry tp_clear;

    /* rich comparisons */
    richcmpfunc tp_richcompare;

    /* weak reference enabler */
    Py_ssize_t tp_weaklistoffset;

    /* Iterators */
    getiterfunc tp_iter;
    iternextfunc tp_iternext;

    /* Attribute descriptor and subclassing stuff */
    struct PyMethodDef *tp_methods;
    struct PyMemberDef *tp_members;
    struct PyGetSetDef *tp_getset;
    struct _typeobject *tp_base;
    PyObject *tp_dict;
    descrgetfunc tp_descr_get;
    descrsetfunc tp_descr_set;
    Py_ssize_t tp_dictoffset;
    initproc tp_init;
    allocfunc tp_alloc;
    newfunc tp_new;
    freefunc tp_free; /* Low-level free-memory routine */
    inquiry tp_is_gc; /* For PyObject_IS_GC */
    PyObject *tp_bases;
    PyObject *tp_mro; /* method resolution order */
    PyObject *tp_cache;
    PyObject *tp_subclasses;
    PyObject *tp_weaklist;
    destructor tp_del;

    /* Type attribute cache version tag. Added in version 2.6 */
    unsigned int tp_version_tag;

    destructor tp_finalize;

} PyTypeObject;

One more used in certain cases:

typedef struct {
    PyObject_HEAD
    Py_ssize_t length;          /* Length of raw Unicode data in buffer */
    Py_UNICODE *str;            /* Raw Unicode buffer */
    long hash;                  /* Hash value; -1 if not set */
    int state;                  /* != 0 if interned. In this case the two
                                 * references from the dictionary to this object
                                 * are *not* counted in ob_refcnt. */
    PyObject *defenc;           /* (Default) Encoded version as Python
                                   string, or NULL; this is used for
                                   implementing the buffer protocol */
} PyUnicodeObject;

There are others, you can see in the source code of Runtime of language.

Sources: