Using functional programming features to remove a word list from another list

Asked

Viewed 79 times

4

I am analyzing a literary text and am having the following problem to remove a group of words from the text. These are not repeated words. link_words are words that will disturb a later analysis of the text, so I want to remove them. The sets below are merely illustrative!

link_words = ["for","on","this","will","do","did","have","has","my","the","as","be","he"]

words = ["for","on","this","will","do","did","have","has","my","the","as","be","he","virus"]

The idea is to apply map() and filter() so that I remove from the list words the words contained in link_words. I took a smaller set to make it easier. The exit should be

["virus"]

Apply the filter:

list(filter(lambda x: x in link_words,words))

whose exit is correct:

["for","on","this","will","do","did","have","has","my","the","as","be","he"]

This is what I tried to do after:

be it filtro = filter(lambda x: x in link_words,words)

I’m not getting to the expected result with

 list(map(lambda  x: words.remove(x),filtro))

What did I miss? How to fix the map?

  • 1

    This answers your question? Remove repeated elements using two lists

  • No need for map and filter, the question suggested above has a solution for your case

  • @hkotsubo I would like to learn how to use them! could help me find the error?

  • @hkotsubo In fact it is not repeated words but a group of words that I want to remove from the text to analyze it without them!

  • Do not stick to the title of the question, read the answers you have there, because the problem is the same (remove from one list the elements that are in another)

  • However, map serves to transform the list elements into something else, then the lambda should return something. Only remove does not return anything, so the result will be a list full of None. I think map/filter are not the best solution for this case

  • @hkotsubo agree but I would like to apply map and filter!

  • @hkotsubo The goal is more to learn how to use map and filter, although it may not be the best approach!

Show 3 more comments

2 answers

2


map serves to transform the list elements into something else and in your case you have nothing to transform, because you just want to filter.

Since you want all the elements of one list that are not in another, just do:

filtrados = list(filter(lambda x: x not in link_words,words))

That is, the elements of words who are not in link_words.

There’s nothing to wear map, because the elements are not changed. map only makes sense if you want to generate other values than the original list. For example:

lista = [1, 2, 3]
# gera outra lista, contendo o dobro de cada elemento
dobros = list(map(lambda x: x * 2, lista))
print(dobros) # [2, 4, 6]

I used map to generate another list, containing twice each number of the original list.

But in your case you’re not modifying the list values (in the sense of turning them into other values), you’re just filtering (choosing which ones you want and which you don’t), so doesn’t make sense use map in that case.

To another answer even gives an example that uses map, but it does absolutely unnecessary things, like creating an array of booleans, and then using index for each element of the list, which makes the algorithm very inefficient, as the list will be covered several times, since index makes a linear search to find the index. It is an unnecessary turn only to "force" the use of a resource (so much so that it recognizes in the end that there is a better solution, without needing to map).

That is, only because it is possible to use map, doesn’t mean you should. You should not force the use of a resource to solve a problem, you should evaluate the problem and see which feature is the most suitable one. And use map is not necessary in that case.

See more about this in this question, that even gives some examples where it makes sense to use map and filter.

See also other solutions for your case (which wouldn’t even need filter also) here.

2

The Error

The line list(map(lambda x: words.remove(x),filtro)) is returning

[None, None, None, None, None, None, None, None, None, None, None, None, None]

(which is not the expected answer) because words.remove(x) returns None. For example, if you define

words = ["for","on","this","will","do","did","have","has","my","the","as","be","he","virus"]

and uses the method remove, you will have

>> words.remove("for")
>> words
["on","this","will","do","did","have","has","my","the","as","be","he","virus"]

but if you do

>> new_words = words.remove("for")
>> new_words
None

A possible response using map and filter

Instead you can do

words_in_link = list(map(lambda x: x not in link_words, words))
list(filter(lambda x: words_in_link[words.index(x)], words))

who will return

['virus']

Explanation:

words_in_link will be a list with Trues if the list string words corresponding is not in link_words and False if you are.

>> print(words)
['for', 'on', 'this', 'will', 'do', 'did', 'have', 'has', 'my', 'the', 'as', 'be', 'he', 'virus']
>> print(words_in_link)
[False, False, False, False, False, False, False, False, False, False, False, False, False, True]

then you use the filter as

list(filter(lambda x: words_in_link[words.index(x)], words))

that will select only the strings of words who are not in link_words.

A better answer, just using filter

No need to use filter and map to solve this problem, you can also do

list(filter(lambda x: x not in link_words, words))

who will return

['virus']

Browser other questions tagged

You are not signed in. Login or sign up in order to post.