1
I’m trying to apply the NMF algorithm to a csv and then extract the phrases linked to each topic
import pandas
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
def display_topics(model, feature_names, no_top_words):
for topic_idx, topic in enumerate(model.components_):
print "Topic %d:" % (topic_idx)
print " ".join([feature_names[i]
for i in topic.argsort()[:-no_top_words - 1:-1]])
textos = pandas.read_csv('teste_nmf.csv', encoding = 'utf-8')
textos_limpos = textos['frase_limpa']
textos_bruts = textos['frase_brut']
textos_bruts_list = textos_bruts.values.tolist()
textos_limpos_list = textos_limpos.values.tolist()
tfidf_vectorizer = TfidfVectorizer()
tfidf = tfidf_vectorizer.fit_transform(textos_limpos_list)
tfidf_feature_names = tfidf_vectorizer.get_feature_names()
#n_components: numero de topicos
nmf = NMF(n_components = 2, random_state = 1, alpha = .1, l1_ratio = .5, init = 'nndsvd').fit(tfidf)
#Numero de palavras por topico
no_top_words = 2
#Visualizaçao dos topicos com as palavras
print 'NMF'
topics = display_topics(nmf, tfidf_feature_names, no_top_words)
print topics
#extrair frases ligadas aos topicos
for topic in range(len(topics)): #TypeError: object of type 'NoneType' has no len()
print "Topic {}:".format(topic)
docs = np.argsort(document_topics[:, topic])[::-1]
for text in docs[:3]:
text_brut = " ".join(textos_bruts_list[text].split(",")[:2])
print " ".join(textos_limpos_list[text].split(",")[:2]) + ',' + text_brut
An example (crude) of dataset:
frase_limpa,frase_brut
manga fruta gostosa,a manga é uma fruta gostosa
computador objeto importante,o computador é um objeto importante
banana fruto popular,a banana é um fruto popular
lapis coisa importante,o lapis é uma coisa importante
uva roxa,a uva é roxa
telefone objeto mundial,o telefone é um objeto mundial
My result:
MFN
Topic 0:
important object
Topic 1:
purple grape
None
Traceback (Most recent call last): File "teste_NMF.py", line 55, in for topic in range(Len(Topics)): #Typeerror: Object of type 'Nonetype' has no Len()
Typeerror: Object of type 'Nonetype' has no Len()
What I expected more or less:
Topic 0:
important object
Topic 1:
purple grape
Topic 0:
computer important object, the computer is an important object
phone worldwide object, phone is a worldwide object
lapis important thing, lapis is an important thing
Topic 1:
purple grape, the grape is purple
But how could I fix this?
– marin
@marin It depends on your logic... For example, return something in
display_topics
- if you put areturn []
at the end, it will return an empty list, and so will not give error, becauselen([])
is0
. I don’t quite understand what you’re writing: Created a function with the prefixdisplay_
which means "flaunt", semantically she should display the topics and finish, but you’re trying to get her back in another part of the code!! Should she be calledformat_topics
? Or should you take what you need elsewhere? Or convert into 2 separate functions?– nosklo
@marin In short, you have to know what you want to do with that part of the code, to tidy it up. But your question is answered.
– nosklo