When to use map() and filter() in Python?

Asked

Viewed 602 times

3

Good afternoon , I have a question regarding these functions I will give as an example some codes:

numeros = list(map(lambda x: x-1 , [2, 3, 4, 5]))
print(numbers)
Output:
[1, 2, 3, 4]

Here in this example I could use the function filter() and actually output?

lista = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
numeros_pares = list(filter(lambda x: x % 2 == 0, lista))
print(numeros_pares)
Output:
[2, 4, 6, 8, 10]

And here you could use the map() function and do the same output?

My goal was to understand where I can use map() or filter().

I appreciate you explaining.

  • The map transforms each element of the list into another - inclusive, the result does not need to be of the same type of input (the lambda could return string or anything else). Already the filter does not transform anything, it always selects from among the elements of the list, those that satisfy the criterion (which in your case is "the number must be even").

1 answer

4

The function map maps; while the filter filtra. They are not equivalent and will never replace each other. In fact it is quite common to use them together (the operations of mapping and filtering, not necessarily the functions).

The function map ensures that there will always be a result for each input value. Strictly speaking we can state that the length of the sequence resulting from a mapping will always be equal to that of the mapped sequence.

See that in your example you mapped the sequence [2, 3, 4, 5] and generated the [1, 2, 3, 4]. Each input element generated an output value.

Even if you apply a function that does not explicitly have a return during mapping, a value will be generated in the output None respective:

def pares(x):
    if x % 2 == 0:
        return x

print(list(map(pares, range(10))))
# [0, None, 2, None, 4, None, 6, None, 8, None]

Already the function filter aims at reducing the input sequence based on the conditions defined by the filter. It is not guaranteed that the generated sequence will always be smaller than the original sequence because there is a case where all elements meet the filter condition, but it is certain that the sequence will never be larger than the original sequence. There is also the certainty that all values of the generated sequence belong also to the original sequence. That is, if filter(A) produces B, then the intersection of A with B will always be B.

This can be observed when you apply a filter in the sequence [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] with the aim of searching only the even numbers, obtaining [2, 4, 6, 8, 10].

In fact, both map how much filter are classes and therefore are not basically called functions, but instantiations of the respective classes. Both instances are eternal objects that will produce the values of the final sequence when consumed on demand. That is, the result sequence is not stored in memory, but produced element by element when it is iterated (so the return is converted to a list before displaying).

About using them together, imagine a sequence of users with the fields email and ativo.

users = [
  ('felix@servidor', True),
  ('caitie@servidor', True),
  ('joel@servidor', False),
  ('violet@servidor', False),
  ('traci@servidor', True)
]

The goal is to get the email list only from active users.

With the map we can extract the information from the email and return a list with this information:

def get_email(user):
  return user[0]

print(list(map(get_email, users)))
# ['felix@servidor', 'caitie@servidor', 'joel@servidor', 'violet@servidor', 'traci@servidor']

But we don’t want all the emails. So we use the filter to obtain only those assets:

def is_active(user):
  return user[1] == True

print(list(filter(is_active, users)))
# [('felix@servidor', True), ('caitie@servidor', True), ('traci@servidor', True)]

But we don’t want all the information, just the email. So we put the two together:

emails = map(get_email, filter(is_active, users))
print(list(emails))
# ['felix@servidor', 'caitie@servidor', 'traci@servidor']

But this is hardly used in practice because it is not an easy-to-read code as long as the comprehensilist on is much easier and produces the same result:

emails = [get_email(user) for user in users if is_active(user)]
print(emails)
# ['felix@servidor', 'caitie@servidor', 'traci@servidor']

The part get_email(user) plays the role of mapping while if is_ative(user) makes the filter.


If we were to translate the behavior of both into a purely Python function, we would have the map:

def map(function, *iterables):
  for values in zip(iterables):
    yield function(*values)

While the function filter would be:

def filter(function, iterable):
  for value in iterable:
    if function(value):
      yield value

In short:

  • map you control which will be the value returned, but not when will be returned (always returns something);
  • filter you control when will be returned, but other than which will be the value returned (always returns the original value);
  • 1

    big answer, I would give more emphasis, because I consider more important: for the syntax of "comprehensions", which is hidden at the end. since the code gets shorter, (and, depending on the operation, more efficient) . You can talk a little about the "reduce" too - that together with the other two make the tripe "filter, map, reduce" for data operations - and that, unlike map and filter, there is no simple way to be done with comprehensions.

  • @jsbueno see if I can fit the suggestions. Thank you.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.