How do you filter a list with multiple words in python?

Asked

Viewed 61 times

2

How can I filter a list in the same way I can in SQL?

Ex: In SQL I can do several AND in a text field with CHARINDEX

SELECT * FROM TB_Produto AS A (NOLOCK) 
WHERE CHARINDEX('papel', NM_PRODUTO) > 0
AND CHARINDEX('sulfite', NM_PRODUTO) > 0

I would like to do the same thing in python, I even made the filter

results = [t for t in buscaproduto if t.NM_PRODUTO.find(termo) > -1]

But mine term past has the whole word (sulphite paper).

The idea is that if I can’t find the word sulphite paper I do a new search just by paper, so I wanted a way to filter the separate words and that each word would be in any part of the string in my column.

  • 1

    Why don’t you write a function busca_termo(termo, texto) -> bool which makes all the validations it needs and in the expression makes [t for t in buscaprodutos if busca_termo(termo, t.NM_PRODUTO)]. If your function returns True, the element will be in the final list.

  • @Woss, knows how to break the string into parts and make type if text.find(paper') and if text.find('sulfite') dynamically ?

2 answers

1


By your description ("if I don’t find the word sulfite paper I do a new search only for paper"), would do so:

def busca_termo(termo, texto):
    # primeiro vê se o termo está contido no texto
    if termo in texto:
        return True
    # senão, procura apenas pela primeira palavra do termo dentro do texto
    return termo.split(maxsplit=1)[0] in texto

produtos = ['papel A4', 'caderno', 'papel sulfite', 'sulfite']
termo = 'papel sulfite'

results = [ p for p in produtos if busca_termo(termo, p) ]
print(results) # ['papel A4', 'papel sulfite']

How are you using find, understood that the term can be in any position of the string (ie, termo must be a substring of texto). Since you don’t seem to need the index (and just want to know if it’s substring or not), the documentation itself recommends using the operator in instead of find.

In the above example, if the term is "sulfite paper", I first check whether the whole term is contained in the text. Otherwise, I look only if "paper" is contained in the text.


I understood that you do not need to search for "sulfite", but if you want to search for all the words of the term, just change the function busca_termo for:

def busca_termo(termo, texto):
    # primeiro vê se o termo está contido no texto
    if termo in texto:
        return True
    # verifca se tem alguma palavra do termo que está contida no texto
    return any(palavra for palavra in termo.split() if palavra in texto)

produtos = ['papel A4', 'caderno', 'papel sulfite', 'sulfite']
termo = 'papel sulfite'

results = [ p for p in produtos if busca_termo(termo, p) ]
print(results) # ['papel A4', 'papel sulfite', 'sulfite']

Now he searches first for "sulfite paper", and if he does not find it, he searches separately for "paper" and "sulfite" (returning True find any of them).

  • 1

    I think your second job is what you wanted, I’ll test it here, it’s much more simplified than the one I did.

0

With the idea of @Woss, I set up a recursive function that checks if there are words in the string and if you have found all words before.

def filtra_busca_termo(termo, termoBusca, achouTermo):
    achou = False
    palavras = termo.split()
    for palavra in palavras:
        if termoBusca.find(palavra) != -1:
            achou = True
        elif achouTermo:
            return False
        elif termo != ' ' and len(palavras) > 1:
            termo = termo.rsplit(' ', 1)[0]
            return filtra_busca_termo(termo, termoBusca, achouTermo)

    return achou

So I call it a new filtered list

produtosFiltro = []
achouTermo: bool = False
for produto in buscaproduto:
    termo = termo.lower()
    termoBusca = produto.TermoBusca.lower()
    if filtra_busca_termo(termo, termoBusca, achouTermo):
        produtosFiltro.append(produto)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.