Intersection of words with Python dictionaries

Asked

Viewed 88 times

1

Hello!

In Wendel Melo’s book, Introduction to the Python Programming Universe, I tried to do the last exercise, but I’m not getting it.

Enunciation:

Make a program that reads three user strings and list each word that appears at least once in one of the strings. Each word only should be listed only once, and next to it the texts in which the same appears. Your program must include a function that receives strings read and return a dictionary wave the keys are composed by the words and items are sets indicating the texts in which each key (word) appears.

Example:

inserir a descrição da imagem aqui

I made this code, it presents the respective items of text1, text2 and text3, but I do not know how to follow, because the words repeat during the code. I believe that a good solution would be to use the intersection, set method, somewhere in the code, but I don’t know how to turn that intersection into words text 1, text2 and text 3, because the method will only return the element of the intersection and not the set where it is. From now on, thank those who help me.

def tDicionario(texto1, texto2, texto3):
    palavras = {}
    
    
    for palavra in texto1:
        print('{}: '.format(palavra))
        
    for palavra in texto2:
        print('{}: '.format(palavra))
        
    for palavra in texto3:
        print('{}: '.format(palavra))
    
    
if __name__ == '__main__':
    
    texto1 = input('Entre com o texto 1: ').split()
    texto2 = input('Entre com o texto 2: ').split()
    texto3 = input('Entre com o texto 3: ').split()
    
    print('Listagem de palavras: ')
    print(tDicionario(texto1, texto2, texto3)) 

2 answers

2


The easiest way is by using a Python dictionary where each of your keys represents a word, and its value is an associate list of text numbers where a given word appears. Using sets would also be feasible, but I believe it is simpler to follow the logic of the program in this way.

It’s also simpler if you ask for input and then put the data into the dictionary, rather than asking for inputs separately. If you do this within a loop, it’s easy to check if the word already appears in the dictionary, or if it has appeared more than once in the same sentence. This even trivializes the generalization of your program: you can ask for an arbitrary number of phrases for the user just by modifying a line.

See the following example:

# Iniciamos com o dicionário vazio
resultado = {}

# Número de frases a pedir para o usuário
numero_de_frases = 3

for n in range(numero_de_frases):
    # n começa de 0, adicionamos 1 e convertemos 
    # para string para simplificar nossa vida 
    num_texto = str(n + 1)
    
    # Pega input do usuário
    texto = input(f'Entre com o texto {num_texto}: ')
    lista_de_palavras = texto.split()
    
    # Itera sobre lista de palavras
    for palavra in lista_de_palavras:
        # Se a palavra é inédita no dicionário, cria uma nova lista
        if palavra not in resultado:
            resultado[palavra] = list()
        # Pega a lista de números dos textos onde a palavra aparece
        lista_num_textos = resultado[palavra]
        # Se a palavra já apareceu no texto, pulamos a linha abaixo
        # (evita entradas repetidas quando a palavra está duplicada)
        if num_texto not in lista_num_textos:
            lista_num_textos.append(num_texto)

# Itera sobre dicionário e mostra os resultados
for palavra, lista_num_textos in resultado.items():
    string_num_textos = ', '.join(lista_num_textos)
    print(f'A palavra "{palavra}" aparece no(s) texto(s): {string_num_textos}')

0

According to the third sentence of the question...

Your program should contemplate a function that receives the strings read and returns a dictionary wave the keys are composed by the words and the items are sets indicating the texts in which each key (word) appears.

...we can implement a function to resolve the issue.

To set up this function we can use some logic. In this case we can use the following logic:

  1. Capture the three suggested texts and store them in a list;
  2. Send this list as a parameter to the function that will mount the dictionary;
  3. Assemble the dictionary;
  4. Display results identical to the one suggested in the question.

Well, one of the possible codes for this logic is:

def localiza_palavra(lista_textos):
    o = range(1, len(lista_textos) + 1)
    dicionario = dict()
    for texto, posicao in zip(lista_textos, o):
        for palavra in texto:
            if palavra not in dicionario:
                dicionario[palavra] = list()
            r = dicionario[palavra]
            if posicao not in r:
                r.append(posicao)
    return dicionario


if __name__ == '__main__':
    lista = list()
    for i in range(1, 4):
        lista.append(input(f'Entre com o texto {i}: ').split())

    resultado = localiza_palavra(lista)

    print('Listagem de palavras:')
    for key, item in resultado.items():
        print(f'{f"{key}:":<10}', *[f'texto {x},' for x in item])

Note that when executing the code, we must enter the three texts. These texts are captured as list and then stored in another list.

After that this list will be sent as parameter to the function localiza_palavra(). Reaching the first block for, with the help of function zip(), shall simultaneously browse the lists lista_textos - formed by all typed texts - and o - formed by the orders of the texts.

Then the second block for will travel each word of text current and, with the help of the block if will be verified if the respective word is not contained in the dictionary. If the word is not in the dictionary, the dictionary will receive as the key the respective word whose value is a list.

This list shall be assembled with the positions of the respective texts.

The second block if will verify whether the position does not belong to the r - list formed by the positions. If the position is not in r, position will be added.

After all possible interactions have been completed, the results will be displayed.

The display of values is performed by the last block for of the code.

The last line of code...

print(f'{f"{key}:":<10}', *[f'texto {x},' for x in item])

...executes the following:

  1. Displays the key value (key) accompanied by " : "in the left corner of a total of 10 spaces;
  2. Displays the unpacking of a list mounted by list comrehension, where each element is formed by the word texto followed by the order number of the text.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.