1
I have the following Dataframe:
df = pd.DataFrame({'Texto' : ['é importante o sucesso', 'o dia está lindo']})
I have two names . txt palavras_positives.txt and palavras_neagtivas.txt. These two frames are opened and transformed into a list of strings as follows:
positive = pd.read_csv('palavras_positivas.txt', encoding='utf-8')
negative = pd.read_csv('palavras_negativas.txt', encoding='utf-8')
The . txt files are formed by a single column and each row has a single word. Both Dataframes are then transformed into string and used as in a search:
lista_positiva = positive['Positivas'].to_list()
pattern_positivo = '|'.join(lista_positiva)
The same process is effect for negative words.
So I use the pattern_positive and the pattern_negative in the following code:
df3['Positivos']=df3['Texto'].str.extractall(({pattern_positivo})").groupby(level=0).agg(','.joi
n)
df3['Negativos'] = df3['Texto'].str.extractall(f"({pattern_negtivo})").groupby(level=0).agg(','.join)
My problem is this:
in the text: " success is important ", I can get the words "success" and "important" because they’re on the positive list. In the negative column I am extracting the word "impose" because this word exists in the negative list, so for some reason the code reads "important" and returns "impose".
My goal is to extract the exact occurrence in both extractall.
Can someone help me?
Good evening! I could ask the question the format of your txt?
– lmonferrari
Txts are single words arranged in a single column.
– StatsPy
Which delimiter?
– Ewerton Belo
After transforming txt’s to string the delimiter is pipe '|'. Join(list_positive)
– StatsPy
Are you using regex? Which one?
– Ewerton Belo
In the case in question it is returning what it has found, it is not explicit that it should return only if it is an equal whole word.
– Ewerton Belo
It is. I need to change the code to return the exact occurrence, comparing text with the list. Here I need help.
– StatsPy