Highlight main words of a text in order of a ready list

Asked

Viewed 231 times

0

I have a very basic basis that I would like to highlight the main words of each line of a prioritization list, example:

Groundwork:

Bom dia 
que dia lindo 
vamos embora 
vamos chutar o dia

Prioritisation list: bom dia vamos lindo

Expected result:

Bom dia - Bom 
que dia lindo - Dia 
vamos embora - vamos 
vamos chutar o dia - dia

I could find it with only one variable and not a list. Code:

texto = 'dia' 

for lin in open(r"C:\Users\guilgig\Desktop\teste.txt"): 
    if texto in lin: 
        print (texto, lin)
  • Why "Good morning" has only "Good" outstanding if "day" also belongs to the prioritization list?

  • Thank you for editing! then following the prioritization list because s need the most important so when he finds one of the words is already ok, but if there is a way showing all the words all right

2 answers

2


The code you made is already very close to what you need, just set the prioritization list and scroll through it by searching the words in the sentence:

with open('teste.txt') as stream:
    for line in stream:
        for word in ['bom', 'dia', 'vamos', 'lindo']:
            if word in line:
                print(line.strip(), '-', word)
                break

See working on Repl.it

As to the with to open the file, you can read in: What’s with no Python for?. For micro-otimasons, you can define your word list with a tuple (tuple) or even a set (set), which, to this end, have advantages over both storage in memory and speed of access to the elements.

0

You can do it using two for:

principais_palavras = "bom dia vamos lindo"

base = """Bom dia
que dia lindo
vamos embora
vamos chutar o dia"""

lista_prioritaria = principais_palavras.lower().split(" ")
linhas = base.lower().split("\n")

for linha in linhas:
  for palavra in lista_prioritaria:
    if (palavra in linha):
      print(linha + " - " + palavra)
      break

The logic of reading the file is not part of the algorithm, so it was left out, but just adapt to your case.

  • 1

    If I may comment, the idea of using the lower() is good if the intention is to disregard the text boxes, but the split() in the line to convert to a word list is unnecessary. The operator in works also for string, serving to seek if there is a substring.

  • When to the lower(), I put because I assumed that the box should be disregarded in the search, based on the example given by AP. When to split(), you’re absolutely right. I don’t know what I was thinking. I’ll make it simple. Thank you. :)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.