Changing data from an entire table column - Python

Asked

Viewed 632 times

0

battle in the "FOR" loop continues. Well, I’ve come across a problem I can’t solve. I have a table and all your data is of type string, in a certain column the data of each row are letters and numbers, this is not a problem, I made a loop "FOR", and limited the size of the string so:

i = []
for i in tabela['Valor']:
    i = i[3:7] # isso funcionou perfeitamente, mas nao consigo substituir os valores na coluna.
    #tabela['Valor'] = i # nessa tentativa nao funciona ficando apenas um numero em toda a coluna
    print(i)

I tried this solution, but "save" only the last data of the loop.

data = pd.DataFrame()
data['Valor'] = tabela['Valor']
i = []
for i in data['Valor']:
    i = i[0:8]
    print(i)
    tabela['Valor'] = i
    tabela = tabela.replace(tabela['Valor'])

when I have the table printed, the values of the column 'Value' are always the same what is wrong.

part of the dataset to be changed:

tabela['Valor']
output: 
tabela['Valor']
0                            R$ 0,02 por ação ITAUSA ON
1                 R$ 0,02120838006 por ação BANESTES ON
2                R$ 0,06270328053 por ação IHPARDINI ON
3                  R$ 0,816430716 por ação BRB BANCO ON
4                 R$ 0,02120838006 por ação BANESTES ON
5                            R$ 0,02 por ação ITAUSA ON
6             R$ 0,23690201678 por ação TELEF BRASIL ON
7                    R$ 6,4876594827 por ação COPASA ON
8                    R$ 0,02310 por ação CELUL IRANI ON
9     R$ 0,907735758004401 por ação SUL AMERICA UNT ...
10                  R$ 0,19477374027 por ação BRASIL ON
11                R$ 0,7158810908 por ação UNIPAR ON N1
12                  R$ 0,05652210279 por ação TAESA UNT
13                  R$ 1,19237597898 por ação TAESA UNT
14               R$ 0,19813998 por ação SLC AGRICOLA ON
15               R$ 0,34644526 por ação SAO MARTINHO ON
16                    R$ 0,2551403880 por ação TEGMA ON
17                    R$ 0,0850467960 por ação TEGMA ON
18             R$ 0,03264206955 por ação AES TIETE E ON

the ultimate goal is to reduce the size of this string and appear only the first 4 numbers.

  • Oops, you can put at least a piece of the dataset?

  • Hello Paul, done... see if it’s clearer now.

1 answer

3


We go to the steps using only the pandas

Create Dataframe

>>> import pandas as pd

>>> df = pd.DataFrame({"A": ["R$ 0,921222 qualquer coisa", "R$ 1,2345 outra coisa", "R$ 0,32212 ultima coisa"]})

>>> df

                            A
0  R$ 0,921222 qualquer coisa
1       R$ 1,2345 outra coisa
2     R$ 0,32212 ultima coisa

Copy from one column to another only what you want

>>> df["B"] = df["A"].str[3:7]

>>> df
                            A     B
0  R$ 0,921222 qualquer coisa  0,92
1       R$ 1,2345 outra coisa  1,23
2     R$ 0,32212 ultima coisa  0,32

Post update 22/11/2020

As suggested by Flavio Moraes in the comments, a more appropriate possibility would be the use of regular expressions to find the values.

Below is the way to do this

>>> df["C"] = df["A"].str.extract(r'(\d,\d+)')

>>> df
                            A     B         C
0  R$ 0,921222 qualquer coisa  0,92  0,921222
1       R$ 1,2345 outra coisa  1,23    1,2345
2     R$ 0,32212 ultima coisa  0,32   0,32212

Note, at this point, that the two columns are string and have the comma as delimiter for the decimals.

End of update

The pandas works with point for the separation of the decimal places. We still have comma.

Convert comma to dot

>>> df["B"] = df["B"].str.replace(",", ".")

>>> df
                            A     B
0  R$ 0,921222 qualquer coisa  0.92
1       R$ 1,2345 outra coisa  1.23
2     R$ 0,32212 ultima coisa  0.32

Convert string to float

>>> df["B"] = df["B"].astype(float)

>>> df
                            A     B
0  R$ 0,921222 qualquer coisa  0.92
1       R$ 1,2345 outra coisa  1.23
2     R$ 0,32212 ultima coisa  0.32

Done. Now just work with the numbers in column B (in this example)

Continuation of the update

At this point, if you have tested the replacement of the comma by point and the conversion of the two columns to float. You would have:

>>> df
                            A     B         C
0  R$ 0,921222 qualquer coisa  0.92  0.921222
1       R$ 1,2345 outra coisa  1.23  1.234500
2     R$ 0,32212 ultima coisa  0.32  0.322120

Notice that for the column C we have all the decimal places found by the regular expression. If you want only two, just configure the pandas for this

>>> pd.options.display.float_format = "{:,.2f}".format

>>> df
                            A    B    C
0  R$ 0,921222 qualquer coisa 0.92 0.92
1       R$ 1,2345 outra coisa 1.23 1.23
2     R$ 0,32212 ultima coisa 0.32 0.32

end of further update

I hope it helps.

  • Amazing! It was perfect....

  • Paul, see if you can help me again, in this Data Frame has 2 columns with the same name, how to change ? is it possible ?

  • 1

    If you find pertinent, mark this answer as a solution to your problem. Responding: If there are not many columns, you can do df.columns = ["A", "B", "C"...]. Next time, avoid putting a new question in the comments, open another question.

  • 1

    Paul gave an upvote on his answer because besides functional is quite didactic. However, I believe that using regex to get the portion of the string relative to the value is important not only for dealing with problems in formatting, but also for cases where the values have more or less than 4 digits. Another less secure option would be to use split or strip, which would still be better than substring.

  • 1

    @Flaviomoraes, thank you very much for the suggestion of regex. I updated the post including this.

  • @Paulomarques, excellent solution

  • 1

    I believe that in this specific case the regex is not necessary (it will increase the complexity and may take longer), given that the values are well defined and you do not need to look for the values in the middle of more complex texts. Maybe a split would be enough. But very good the answer. Big hug and an up vote for the answer!

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.