How to change abbreviated values in a DF using Pandas in Python

Question

How to change abbreviated values in a DF using Pandas in Python

Asked 4 years, 6 months ago

Viewed 53 times

2

I have the following problem, I have a df with more than 50 columns, in these columns, some have value of type "35,57B", "6,85T". How do I make iteratively and replace the examples cited above so that they stay as follows "3557000000","6850000000000"?

I tried something like

for col in df:
    df.col = (df.col.replace(r'[KMBT]+$', '', regex=True).astype(float) * df.col.str.extract(r'[\d\.]+([KMBT]+)', expand=False).fillna(1).replace(['K','M','B','T'], [10**3, 10**6, 10**9, 10**12]).astype(int))

But I was unsuccessful, and had the following mistake:

AttributeError: 'DataFrame' object has no attribute 'col'

Thanks for the answers with them it was possible to elaborate the solution. Which was the following:

for col in df.columns[1:]:    
    df[col] = (df[col].str.replace(',','.').replace('N/A','0').replace(r'[kMBT%]+$', '', regex=True).astype(float) * df[col].str.extract(r'[\d\.]+([kMBT%]+)', expand=False).fillna(1).replace(['k','M','B','T',"%"], [10**3, 10**6, 10**9, 10**12,1]).astype(float))
    df[col] = df[col].astype(str)
    df[col] = df[col].str.replace('.',',')

Can provide an example dataset?

– Lucas

2021/01/25 at 23:10

2 answers

3

One option is to define a function, use list comprehension and iterate over df.columns:

def transform_str(value:str):
    return value.replace(",","").replace("B","000000").replace("T","0000000000")

for col in df.columns:
    df[col]=[transform_str(k) for k in df[col]]

Good afternoon! Then your solution generates the following output 3562000000, 6850000000000 only the values are for example 35,63B and 6,87T, the data is not all in the same format, has data like "48,1B". For this reason I was looking for something that multiplied the existing value by, for example, replacing "B" with "10**9". But thank you so much gave me an insight into how to solve!

– Dataworm1523

2021/01/26 at 15:10
1

I get it. But these were the examples you gave. To have a complete solution, I suggest sharing the database or a part of it.

– Lucas

2021/01/26 at 15:13

Browser other questions tagged python pandas date

You are not signed in. Login or sign up in order to post.

by lmonferrari • **3,550** points · Answer 1 · 2021-01-25T23:40:23+00:00

In addition to the response of Lucas (that I even prefer), you can keep the basis of your code

Creating the test data frame

import pandas as pd
dados = ["35,57B", "6,85T"]
df = pd.DataFrame({'Dados': dados})
df

Displaying the data frame

     Dados
0   35,57B
1    6,85T

I added an extra replace and in regex I put a comma in place of the point, astype changed to int in both conditions

df['Dados'] = df['Dados'].str.replace(',','').replace(r'[KMBT]+$', '', regex=True).astype('int64') * df['Dados'].str.extract(r'[\d\,]+([KMBT]+)', expand = False).replace(['K','M','B','T'], [10**3, 10**6, 10**9, 10**12]).astype('int64')

If you want to save as string

df['Dados'] = df['Dados'].astype(str)

Exit:

0      3557000000000
1    685000000000000