1
I am now beginning my quest for data science. I created a code in the notebook jupyter to work with CBOW and Skip-Gram and I need to plot a similarity graph. Many with whom I talked indicated me the TSNE. Unfortunately I am not able to pass the parameters to perform the plotting. I’m using Corpus text8
import pandas as pd
import gensim, nltk, warnings
from gensim.models import Word2Vec
from nltk.tokenize import sent_tokenize, word_tokenize
warnings.filterwarnings(action = 'ignore')
#nltk.download('punkt')
#!pip install wordcloud
from wordcloud import WordCloud
from matplotlib import pyplot as plt
from sklearn.manifold import TSNE
from bokeh.io import push_notebook, show, output_notebook
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, LabelSet
from sklearn.decomposition import PCA
corpus = pd.read_csv('text8')
corpus = open('text8', 'r')
raw_sentences = corpus.read()
sentences = raw_sentences.replace("\n", " ")
sentences[:5000]
data = []
for i in sent_tokenize(sentences):
temp = []
for j in word_tokenize(i):
temp.append(j.lower())
data.append(temp)
len(data)
model1 = Word2Vec(data, min_count = 1, size = 100, window = 5)
print("Similaridade entre 'revolution' e 'governance' usando CBOW : {}".format(model1.similarity('revolution', 'governance')))
print("Similaridade entre 'anarchism' e 'anarchist' usando CBOW : {}".format(model1.similarity('anarchism', 'anarchist')))
print("Similaridade entre 'chaos' e 'revolution ' usando CBOW : {}".format(model1.similarity('chaos', 'revolution')))
print("Similaridade entre 'american' e 'indian ' usando CBOW : {}".format(model1.similarity('american', 'indian')))
print("Similaridade entre 'movement' e 'civil ' usando CBOW : {}".format(model1.similarity('movement', 'civil')))
Here I would like to save the results and generate the Plot to check the similarity. Could you instruct me? I thank you already.