Complementing the @Lacobus response, to know the classification of each word, you can separate the positives and negatives as follows:
import csv
import string
from collections import Counter
palavras = []
positivo = []
negativo = []
with open('tweets.csv' ) as arqcsv:
leitor = csv.reader( arqcsv, delimiter=';')
for linha in leitor:
plinha = [palavra.strip( string.punctuation ) for palavra in linha[0].lower().split()]
palavras += plinha
if(linha[1].lower() == 'positivo'):
positivo += plinha
else:
negativo += plinha
cntPalavras = Counter(palavras)
cntPositivo = Counter(positivo)
cntNegativo = Counter(negativo)
for palavra, frequencia in sorted(cntPalavras.items(), key=lambda i: i[1], reverse=True):
pos = cntPositivo[palavra]
neg = cntNegativo[palavra]
print( '{} : [ f: {}, p: {}, n: {} ]'.format(palavra,frequencia, pos, neg) )
Using the same test csv file, will result in the following output:
nec: [f: 4, p: 4, n: 0]
sed: [f: 4, p: 3, n: 1]
sit: [f: 3, p: 3, n: 0]
amet: [f: 3, p: 3, n: 0]
mauris: [f: 3, p: 1, n: 2]
vel: [f: 3, p: 1, n: 2]
dolor: [f: 2, p: 2, n: 0]
elit: [f: 2, p: 2, n: 0]
odio: [f: 2, p: 2, n: 0]
rutrum: [f: 2, p: 2, n: 0]
facilisis: [f: 2, p: 1, n: 1]
convallis: [f: 2, p: 2, n: 0]
luctus: [f: 2, p: 2, n: 0]
purus: [f: 2, p: 2, n: 0]
interdum: [f: 2, p: 2, n: 0]
id: [f: 2, p: 2, n: 0]
malesuada: [f: 2, p: 2, n: 0]
in: [f: 2, p: 0, n: 2]
faucibus: [f: 2, p: 1, n: 1]
et: [f: 2, p: 1, n: 1]
maximus: [f: 2, p: 0, n: 2]
justo: [f: 2, p: 1, n: 1]
morbi: [f: 2, p: 1, n: 1]
enim: [f: 2, p: 2, n: 0]
tristique: [f: 2, p: 2, n: 0]
felis: [f: 2, p: 1, n: 1]
risus: [f: 2, p: 1, n: 1]
etiam: [f: 2, p: 0, n: 2]
vitae: [f: 2, p: 1, n: 1]
pharetra: [f: 2, p: 0, n: 2]
lorem: [f: 1, p: 1, n: 0]
ipsum: [f: 1, p: 1, n: 0]
consectetur: [f: 1, p: 1, n: 0]
adipiscing: [f: 1, p: 1, n: 0]
pellentesque: [f: 1, p: 1, n: 0]
scelerisque: [f: 1, p: 1, n: 0]
nunc: [f: 1, p: 1, n: 0]
maecenas: [f: 1, p: 1, n: 0]
venenatis: [f: 1, p: 1, n: 0]
nulla: [f: 1, p: 1, n: 0]
elementum: [f: 1, p: 1, n: 0]
est: [f: 1, p: 1, n: 0]
vivamus: [f: 1, p: 0, n: 1]
non: [f: 1, p: 0, n: 1]
nullam: [f: 1, p: 0, n: 1]
lacinia: [f: 1, p: 0, n: 1]
massa: [f: 1, p: 0, n: 1]
libero: [f: 1, p: 0, n: 1]
vulputate: [f: 1, p: 0, n: 1]
nisi: [f: 1, p: 0, n: 1]
suscipit: [f: 1, p: 0, n: 1]
consequat: [f: 1, p: 0, n: 1]
neque: [f: 1, p: 1, n: 0]
semper: [f: 1, p: 1, n: 0]
ante: [f: 1, p: 1, n: 0]
aliquam: [f: 1, p: 1, n: 0]
egestas: [f: 1, p: 1, n: 0]
integer: [f: 1, p: 1, n: 0]
eget: [f: 1, p: 1, n: 0]
efficitur: [f: 1, p: 1, n: 0]
accumsan: [f: 1, p: 1, n: 0]
quis: [f: 1, p: 1, n: 0]
tempor: [f: 1, p: 1, n: 0]
ut: [f: 1, p: 1, n: 0]
magna: [f: 1, p: 0, n: 1]
augue: [f: 1, p: 0, n: 1]
quisque: [f: 1, p: 1, n: 0]
blandit: [f: 1, p: 1, n: 0]
sollicitudin: [f: 1, p: 1, n: 0]
rhoncus: [f: 1, p: 1, n: 0]
lectus: [f: 1, p: 1, n: 0]
congue: [f: 1, p: 1, n: 0]
lacus: [f: 1, p: 1, n: 0]
donec: [f: 1, p: 1, n: 0]
leo: [f: 1, p: 1, n: 0]
gravida: [f: 1, p: 1, n: 0]
tortor: [f: 1, p: 1, n: 0]
ex: [f: 1, p: 0, n: 1]
tellus: [f: 1, p: 0, n: 1]
orci: [f: 1, p: 1, n: 0]
varius: [f: 1, p: 1, n: 0]
natoque: [f: 1, p: 1, n: 0]
penatibus: [f: 1, p: 1, n: 0]
magnis: [f: 1, p: 1, n: 0]
dis: [f: 1, p: 1, n: 0]
parturient: [f: 1, p: 1, n: 0]
montes: [f: 1, p: 1, n: 0]
nascetur: [f: 1, p: 0, n: 1]
ridiculus: [f: 1, p: 0, n: 1]
mus: [f: 1, p: 0, n: 1]
at: [f: 1, p: 0, n: 1]
porta: [f: 1, p: 0, n: 1]
Your "frequency table" will be calculated individually for each
tweet
or it would be a single table for alltweets
?– Lacobus
A single table for all tweets, containing all the words of all the tweets together.
– Gabriel Augusto