How to create a dictionary with a word and its adjacent from a string?

Asked

Viewed 191 times

0

I have the following string:

texto = "We are not what we should be We are not what we need to be But at least we are not what we used to be"

This would be the return I wanted: for every word existing in the text, the same receive the word adjacent to it.

Example:

retorno = {‘we’: [‘are’, ‘should’, ‘are’, ‘need’, ‘are’, ‘used’], ‘are’: [‘not’, ‘not’]}

3 answers

4


To catch each word and its adjacent, we can divide the string in the blanks and use the function zip to group them in pairs:

texto = "We are not what we should be We are not what we need to be But at least we are not what we used to be"
palavras = texto.split()

for a, b in zip(palavras, palavras[1:]):
    ...

How do you want to generate a dictionary of lists, we can use collections.defaultdict to simplify:

from collections import defautdict

resultado = defaultdict(list)

texto = "We are not what we should be We are not what we need to be But at least we are not what we used to be"
palavras = texto.split()

for a, b in zip(palavras, palavras[1:]):
    resultado[a.lower()].append(b.lower())

Thus, resultado will be the representation of:

{
    'we': ['are', 'should', 'are', 'need', 'are', 'used'], 
    'are': ['not', 'not', 'not'], 
    'not': ['what', 'what', 'what'], 
    'what': ['we', 'we', 'we'], 
    'should': ['be'], 
    'be': ['we', 'but'], 
    'need': ['to'], 
    'to': ['be', 'be'], 
    'but': ['at'], 
    'at': ['least'], 
    'least': ['we'], 
    'used': ['to']
}

2

First you break the phrase into words:

words = texto.lower().split()

With this list of words, simply iterate over it by attaching the next word. So you don’t have much work, you can use the works collections.defaultdict, that will create a list dictionary for Windows. The code would look like this:

import collections
adjacente = collections.defaultdict(list)

for (i, word) in enumerate(words[:-1]):
    next_word = words[i + 1]
    adjacente[word].append(next_word)

Remembering that we do -1 to pick up n - 1 words, since the last word has no words adjacent to it.

And the result:

adjacente
defaultdict(list,
        {'But': ['at'],
         'We': ['are', 'are'],
         'are': ['not', 'not', 'not'],
         'at': ['least'],
         'be': ['We', 'But'],
         'least': ['we'],
         'need': ['to'],
         'not': ['what', 'what', 'what'],
         'should': ['be'],
         'to': ['be', 'be'],
         'used': ['to'],
         'we': ['should', 'need', 'are', 'used'],
         'what': ['we', 'we', 'we']})

In case you wanted the words to be unique, change the defaultdict of list for set and instead of append, use update by passing an array with next_word.

0

I got it that way too.

texto = "We are not what we should be We are not what we need to be But at least we are not what we used to be"

lista = texto.lower().split()

dic = {}

for i in range(len(lista) - 1):
    current = lista[i]
    next_ = lista[i + 1]
    if current not in dic:
        dic[current] = []

    dic[current].append(next_)

print(dic)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.