Removing Symbols in Python dataframe columns

Asked

Viewed 62 times

2

Made a web scraping in Python , so far ok, only that generated a table with ( ) and [ ], n, -, wanted a function code in python to use to clean the dataframe of the figure below, can be column by column. only remaining the names. Thank you!

I tried to:

#dataFrame.replace(to_replace='[' , value = "") 
#DataFrame.replace (to_replace = None, value = None, inplace = False, limit = None, regex = False, method = 'pad') 

&

def remove_str(txt):
    list_remove = [' [',' ]',' \n',' -',' ]
    for t in list_remove:
        txt = txt.strip(t)
    return txt
tempdata = pd.DataFrame(columns=['Time','Value'])
data['titulo'] = data.apply(lambda row : remove_str(row['titulo']), axis=1)

tabelas (dataframes) os filmes, organizando-os por  por gênero e por ano.

1 answer

1


Data Test Frame

import pandas as pd
import re

titulo = ['[Cobra Kai]', '[Bridgerton]', '[Vikings]']
genero = ['[\nAction, Comedy, Drama]', '[\nDrama, Romance]','[\nAction,Adventure, Drama]']
ano = ['[(2018)]','[(2020)]','[(2013-2020)]']

df = pd.DataFrame({'Titulo': titulo, 'Genero': genero, 'Ano':ano})

Code

simbolos = '][\n)( '
pattern = "[" + simbolos + "]"

df = df.applymap((lambda x: re.sub(pattern, '', x)))
  1. Creating the test data frame
  2. defining the symbols that will be excluded
  3. Setting a standard
  4. Applying the re.sub function throughout the data frame

Exit

      Titulo                 Genero           Ano
0   CobraKai    Action,Comedy,Drama          2018
1   Bridgerton  Drama,Romance                2020
2   Vikings     Action,Adventure,Drama  2013-2020
  • Very good , this code works in this new table! But when I use it in my Dataframe, it doesn’t work. You can apply this code, but instead of creating a new one, for the existing one defined as dataframe (everything lowercase) create symbols = '][ n)( ' Pattern Code = "[" + symbols + "]" df = df.applymap((lambda x: re.sub(Pattern, ', x)))

  • Very good , this code works in this new table! But when I use it in my Dataframe, it doesn’t work. You can apply this code, but instead of creating a new one, for the existing one defined as dataframe (all lower case)Data Frame That is to use dataframe instead of a new table below: title = ['[Cobra Kai]', '[Bridgerton]', '[Vikings]'] genero = ['[ nAction, Comedy, Drama]', '[ nDrama, Romance]','[ nAction,Adventure, Drama]'] year = ['[(2018)]','[(2020)]'[(2013-2020)]']

  • @Mdatascience, good afternoon! I didn’t quite understand the statement. Does it work for you or didn’t it work? If it didn’t work, post in the question your real data, this way I can test and help you in a more appropriate way. Hug!

  • Your code works, but in my code it didn’t work, so it didn’t work in my case, in the existing table. Yes the ideal would be to test in direct code, because I’ve been trying to do it in various ways and understand why it’s giving error.. I’m new here, I can put the code there (it’s for study) only it’s not formatted.

  • @Mdatascience, good morning! Put your data in some cloud hosting and make the link available in the question, so I can better analyze what is happening. Hug!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.