How to make string substitutions from large-scale tuple indexes?

Asked

Viewed 75 times

0

I need to find the index of some words in a string, store these indexes and, from them, perform other substitutions based on tuple indexes, which signal which words should be replaced and where they are (index) in the string. It is necessary to have the index because it is necessary to detect where in the text the reverse operation can be applied in the future, since not every word is in the unofficial spelling. For example, not every "she" is registered as "ea", not every "with" as "c'", etc. Therefore, substitutions without registering the index do not solve my problem. The proposal is to turn unofficial spelling into official spelling. So far, I have found the indexes of the substitutions to be made, but I don’t know how to replace them in large scale. The example string is as follows:

texto = """
ontem ea foi lá cum uma cara fea demais
e' falou c'ocê na parte da tarde
o' como que eu tô tá vendo
  """

From the string, I use a regular expression that gives me the following tuples:

tuplas1 = [
    ("ela", "ea"),
    ("com", "cum"),
    ("feia", "fea"),
    ("ele", "e'"),
    ("com você", "c'ocê"),
    ("olha", "o'"),
    ("estou", "tô"),
    ("está", "tá"),
]

To get the index of unofficial spelling words in the string, I use the second element of the variable tuplas1 as a basis. I did so:

for x, y in enumerate(texto2.split()):
    for z in tuplas1:
        if y == z[1]:
            lista_idx.append((x, y, z[0]))

Upshot:

lista_idx = [
    (1, "ea", "ela"),
    (4, "cum", "com"),
    (7, "fea", "feia"),
    (9, "e'", "ele"),
    (11, "c'ocê", "com você"),
    (16, "o'", "olha"),
    (20, "tô", "estou"),
    (21, "tá", "está"),
]

Now I need to use the third element of each tuple in lista_idx to replace the second element of the index-based tuple, given in the first element. In the example below, I transformed the string into a list and replaced the index element [1] of texto_lista by the index element [2] index tuple [0], that is, I replaced "ea" with "she" in the first tuple of the variable lista_idx:

lista_idx = [(1, 'ea', 'ela')
texto_lista = texto.split()
texto_lista[1] = lista_idx[1][2]

Result of example: the phrase ''yesterday and was there cum a face too'' would be "yesterday she went there with an ugly face too". How to do this based on all indexes stored in the first element of the variable lista_idx in the text?

  • 1

    What should be the result? If you only want to use the tuples to replace in the original string, that would be enough: https://ideone.com/Ml40nF

  • Thanks for trying to help. It is necessary to have the index because it is necessary to detect where in the text the reverse operation can be applied in the future, since not every word is in the unofficial spelling. For example, not all "she" is registered as "ea", not all "com" as "c'", etc. I edited my post with this information. Unfortunately, replace does not resolve. Result of example: the phrase ''yesterday and was there cum a face too much'' would be "yesterday she went there with an ugly face too".

  • 1

    I do not know if I understand, but finally, another alternative: https://ideone.com/0mzauf

  • You want to replace words based on their position?

  • That’s right @hkotsubo . I managed to do it in for loop differently. I transformed the whole text into a list and I did so "for x in lista_idx: texto_lista[x[0]] = x[2]". Thus, the text in the list receives, at the position of x[0], the third element of the tuple. Then just join with Join. Anyway, thanks for your help.

  • @Paulomarques I wanted to replace the second word of the tuple (1, "ea", "she") in the text I passed. I put the text in the list and did for loop in the list of tuples: "for x in lista_idx: texto_lista[x[0]] = x[2]", so that the text in the list receives, at position x[0], the third element of the tuple x[2]. Pretty complex idea behind and very simple code to make, but I broke my head for a couple of days.

Show 1 more comment

1 answer

0

I got it, guys. I used the individual substitution principle texto_lista[1] = lista_idx[1][2] as in the question, but inside the loop for, without indicating the order index of the tuple, since they would all be. In this case, I just point out x2, to indicate the third element. So I replace all unofficial spelling words in the position x1 by the official spelling, in the position x2

texto_lista = texto.split()

for x in lista_idx: texto_lista[x[0]] = x[2]

print (texto_lista) ['ontem', 'ela', 'foi', 'lá', 'com', 'uma', 'cara', 'feia', 'demais', 'ele', 'falou', 'com você', 'na', 'parte', 'da', 'tarde', 'olha', 'como', 'que', 'eu', 'estou', 'está', 'vendo']

Later, just turn into string with " ".join.

" ".join(texto_lista) Out[30]: 'ontem ela foi lá com uma cara feia demais ele falou com você na parte da tarde olha como que eu estou está vendo'

Browser other questions tagged

You are not signed in. Login or sign up in order to post.