How do I remove alphabetic characters from a column of a pd. Series? (Python)

Asked

Viewed 76 times

0

My question is quite simple. Given a pd.Series as described below, how do I remove "YEARS" and "MONTHS" characters from it? I looked at the documentation of Pandas but unfortunately I could not find a way.

I made the following variable to return to pd. Series:

In:

idade_serie = dataframe['Idade'].value_counts()

Out:

80 ANOS     91
70 ANOS     85
73 ANOS     82
75 ANOS     81
76 ANOS     79
            ..
103 ANOS     1
17 ANOS      1
4 MESES      1
26 ANOS      1
19 AN0S      1
Name: Idade, Length: 109, dtype: int64
  • 1

    You have to be careful with the way you want to do Count of these values, removing the string "YEARS" and "MONTHS" will cause the numbers to unite in the sum of the value_counts(), it is not possible to distinguish what is what.

2 answers

0

You can do this using the function .split() for space and selecting the first position of the array, after that use the function value_counts().

data = {'Idade': ['80 ANOS', '80 ANOS', '80 ANOS', '80 ANOS', '80 ANOS',
                  '80 ANOS', '80 ANOS', '70 ANOS', '70 ANOS', '70 ANOS',
                  '73 ANOS ', '73 ANOS ', '73 ANOS ', '73 ANOS ']}

df = pd.DataFrame(data)
df['Idade'].str.strip().str.split(' ').str[0].value_counts()
#saida:
80    7
73    4
70    3

Split function breaks the column Age through space, transforming data into arrays:

df['Idade'].str.strip().str.split(' ')
#saida
0     [80, ANOS]
1     [80, ANOS]
2     [80, ANOS]
3     [80, ANOS]
4     [80, ANOS]
5     [80, ANOS]
6     [80, ANOS]
7     [70, ANOS]
8     [70, ANOS]
9     [70, ANOS]
10    [73, ANOS]
11    [73, ANOS]
12    [73, ANOS]
13    [73, ANOS]

and the last .str[0] serves to select the first position of each array:

df['Idade'].str.strip().str.split(' ').str[0]
#saida
0     80
1     80
2     80
3     80
4     80
5     80
6     80
7     70
8     70
9     70
10    73
11    73
12    73
13    73

0

all good? One way is to use the replace method, but to work you must convert your Dataframe to string.

I took a test using your case, and it went something like this:

import pandas as pd

data = {'Idade': ['80 ANOS', '80 ANOS', '80 ANOS', '80 ANOS', '80 ANOS',
                  '80 ANOS', '80 ANOS', '70 ANOS', '70 ANOS', '70 ANOS',
                  '73 ANOS ', '73 ANOS ', '73 ANOS ', '73 ANOS ']}

df = pd.DataFrame(data)
idade_serie = df['Idade'].value_counts()

print(df)
print('-'*40)
print(str(idade_serie).replace("ANOS", ""))

If you have another way, share it with us. hug!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.