Removing Symbols in Python dataframe columns

Question

Removing Symbols in Python dataframe columns

Asked 4 years, 6 months ago

Viewed 62 times

2

Made a web scraping in Python , so far ok, only that generated a table with ( ) and [ ], n, -, wanted a function code in python to use to clean the dataframe of the figure below, can be column by column. only remaining the names. Thank you!

I tried to:

#dataFrame.replace(to_replace='[' , value = "") 
#DataFrame.replace (to_replace = None, value = None, inplace = False, limit = None, regex = False, method = 'pad')

&

def remove_str(txt):
    list_remove = [' [',' ]',' \n',' -',' ]
    for t in list_remove:
        txt = txt.strip(t)
    return txt
tempdata = pd.DataFrame(columns=['Time','Value'])
data['titulo'] = data.apply(lambda row : remove_str(row['titulo']), axis=1)

1 answer

Browser other questions tagged replace

You are not signed in. Login or sign up in order to post.

by lmonferrari • **3,550** points · Answer 1 · 2021-01-15T22:57:09+00:00

1

Data Test Frame

import pandas as pd
import re

titulo = ['[Cobra Kai]', '[Bridgerton]', '[Vikings]']
genero = ['[\nAction, Comedy, Drama]', '[\nDrama, Romance]','[\nAction,Adventure, Drama]']
ano = ['[(2018)]','[(2020)]','[(2013-2020)]']

df = pd.DataFrame({'Titulo': titulo, 'Genero': genero, 'Ano':ano})

Code

simbolos = '][\n)( '
pattern = "[" + simbolos + "]"

df = df.applymap((lambda x: re.sub(pattern, '', x)))

Creating the test data frame
defining the symbols that will be excluded
Setting a standard
Applying the re.sub function throughout the data frame

Exit

      Titulo                 Genero           Ano
0   CobraKai    Action,Comedy,Drama          2018
1   Bridgerton  Drama,Romance                2020
2   Vikings     Action,Adventure,Drama  2013-2020

Very good , this code works in this new table! But when I use it in my Dataframe, it doesn’t work. You can apply this code, but instead of creating a new one, for the existing one defined as dataframe (everything lowercase) create symbols = '][ n)( ' Pattern Code = "[" + symbols + "]" df = df.applymap((lambda x: re.sub(Pattern, ', x)))

– M Data Science

2021/01/17 at 13:57
Very good , this code works in this new table! But when I use it in my Dataframe, it doesn’t work. You can apply this code, but instead of creating a new one, for the existing one defined as dataframe (all lower case)Data Frame That is to use dataframe instead of a new table below: title = ['[Cobra Kai]', '[Bridgerton]', '[Vikings]'] genero = ['[ nAction, Comedy, Drama]', '[ nDrama, Romance]','[ nAction,Adventure, Drama]'] year = ['[(2018)]','[(2020)]'[(2013-2020)]']

– M Data Science

2021/01/17 at 14:05
@Mdatascience, good afternoon! I didn’t quite understand the statement. Does it work for you or didn’t it work? If it didn’t work, post in the question your real data, this way I can test and help you in a more appropriate way. Hug!

– lmonferrari

2021/01/17 at 16:59
Your code works, but in my code it didn’t work, so it didn’t work in my case, in the existing table. Yes the ideal would be to test in direct code, because I’ve been trying to do it in various ways and understand why it’s giving error.. I’m new here, I can put the code there (it’s for study) only it’s not formatted.

– M Data Science

2021/01/18 at 10:28
@Mdatascience, good morning! Put your data in some cloud hosting and make the link available in the question, so I can better analyze what is happening. Hug!

– lmonferrari

2021/01/18 at 10:31