There are three different data structures - at both the use and implementation level - and each of them can serve a better purpose. (Note - in the text below I end up using alternately the names in English and Python with the translations "list", "tuple" and "set")
To list and the tuple are more similar to each other - both are Sequences (Sequences) - meaning that they contain data in an orderly fashion. The biggest difference is that a list is a changeable sequence: that is, once you create the list, you can continue to include elements in it, changing other elements and even deleting elements. The tuple is a sequence immutable: that is, once created, it can no longer be changed.
tuple: In terms of use, the immutability of the tuple makes it possible to use it as a dictionary key, for example - already, lists, cannot be dictionary keys.
Moreover, almost more by tradition and practicality, there is a tendency for each element of a tuple to represent a specific datum - almost as if it were a "structure", but it does not need to be stated. So much so that at some point they invented the "namedtuple" - objects that work exactly like tuples, but which actually have fields, and each element can be accessed in addition to by index, with the operator [ ]
, also by name, using the .
. Tuples are automatically generated by Python in some situations - in particular, if there is no ambiguity, the parentheses around the tuple are optional and just write the elements separated by comma.
list: In contrast, lists are generally used for more homogeneous data, and "anonymous" - hardly anyone creates a list with 3 elements in which the first is a numerical index, the second a string, and the third the normalized string - is more common than, for example, each element in the list is a word from a text, or a line from a file. Note that nothing forces it to be like this - these distinct uses end up arising naturally as one gains experience with language.
Internally, tuples are a little more efficient than lists - but from the point of view of the Python programmer it doesn’t make much difference.
set: Already sets, or "sets", are quite different: just like the tuple and the list they contain data (the three are "containers") - but the data does not have an order: no matter what order you put the data in a set, you can only take back or a random data, or go through all elements (with a for
), but in an unknown order.
Moreover, each element can only be present in a single set. If you add a repeated element, this copy is simply ignored. In this sense they are like the sets defined in mathematics (which we learned from pre-primary). If I have, in mathematics, the set of odd numbers from 1 to 9, and try to add the number "3" in that set, it does not change: the 3 was already inside it before.
Another difference from Python sets to tuples and lists is that all objects in a set have to be immutable - (or at least have to have a well-defined hash). That is, lists cannot be inserted into sets, but tuples can.
In particular, because there is no order in the elements of a set, the only thing you can know is whether an element belongs to a set - with the operator in
: 0 in {0, 2, 3, 4}
will return True
, for example. And this operation to "check if an element is in a set" can be much faster for a conjunct than for a list. Up to thousands times faster - because the algorithm to know if an element is in a set takes a constant time - regardless of the size of the set. In sequences, the verification of an element’s belonging is done in a linear way. In practice for sets or lists of up to 20 elements the difference is negligible, but if you have to filter which 100,000 elements are in a data mass of 100,000,000 elements, it can be the difference between a program that takes hours to run, and one that ends in two seconds.
But you don’t have the methods .index
or .find
(which exist for lists and tuples) to know the position of an element: the elements in a set have no position.
A very common use of sets is precisely to remove duplicates from a sequence. Let’s assume that you have a list of words that has come out of a text - a few repeated dozens of times, and want to have only one word of each - just do: palavras = set(lista_de_todas_as_palavras)
- all duplicates "collapse" automatically, and you get a set with a word of each (remember: out of order)
In addition, sets implement the most common operations with conjuncts we’ve learned in mathematics: intersection, togetherness, difference among others - you can both use methods with the appropriate names (do dir(set())
at the interactive prompt to see the methods available), how much of these operations work directly with the mathematical and logical operators of Python:
In [189]: a = {1, 2, 3}
In [190]: b = {3, 4, 5}
# diferença
In [191]: a - b
Out[191]: {1, 2}
# união
In [193]: a | b
Out[193]: {1, 2, 3, 4, 5}
# intersecção:
In [194]: a.intersection(b)
Out[194]: {3}
It is possible to spend years programming in Python without ever need use sets. But when the opportunity arises, only of them exist, they can save several hours and dozens of lines of code at once.
frozenset: Another less well-known type - among other things because it doesn’t have a separate syntax in the language, unlike sets, lists and tuples, are the "frozenset" - frozenset practically stands for sets as lists stand for tuples: once created, it cannot be changed: you cannot add new elements or remove anything that is there. Thus, frozensets can be used as dictionary keys and set elements (and of course, frozensets as well) - which normal sets cannot.
You create a frozenset by calling it as if it were a function, and passing as a single parameter an eternal object (the 4 types we treat here are eternal): meus_dados = frozenset({1, 2, 3, 4})
What is the main difference between a Tuple and a List? | When to use lists and when to use tuples? | There is a performance difference between Tuple and List? | What’s the set for in Python?
– Woss
Possible duplicate of What is the main difference between a Tuple and a List?
– Ivan Noleto
@Ivan the question you quoted as possible duplicate does not address the set issue.
– Cadu