String handling with python and pandas

Asked

Viewed 335 times

0

I’m trying to create a function that traverses a dataset and removes characters from strings like ('? *'), and returns the already corrected column within the dataset.

As an example of dataset:


df = pd.DataFrame([[np.nan, 'ds??', 'fgfs', 0],
                       [3, 'dsda#..*', np.nan, 1],
                       [np.nan, '1 ??d', np.nan, 5],
                       [np.nan, 'v2', 0, 4]],
                       columns=list('ABCD'))
'''
Gostaria de um função que retorna-se as colunas sem os caracteres #?,somente com números e letras 

eu preciso de uma função um pouco mais genérica , para o dataset todo tipo , que reconheça a coluna object e trata ela ,estou tentando fazer dessa forma .


 def tratar_str2(df):
    for col in df.columns:
        if df[col].dtype.name == df['object']:
            for k , v in enumerate(df['object']):
                df[v] = re.sub(r'(?<![a-z])-|-(?![a-z])', '',df[v], flags=re.IGNORECASE)
    return df

1 answer

0

You can use the method replace of strings to replace the characters you want. To use string methods in pandas just apply the method str in the interest column. Example removing ?:

df['A'].str.replace('?','')

If you want to remove the same character for all columns just use a for loop (assuming 3 columns, A, B and C):

for k in ['A','B','C']:
    df[k]=df[k].str.replace('?','')

Browser other questions tagged

You are not signed in. Login or sign up in order to post.