Stem for Twitter

Question

Stem for Twitter

Asked 6 years, 4 months ago

Viewed 90 times

2

Dear colleagues, I am trying to do a twittering analysis of a Timeline and needed to stemiate the texts for analysis. I am trying the following procedure:

setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
tweets <- userTimeline("Pragmatismo_", n = 3000)
tweets.df <- twListToDF(tweets)
myCorpus <- Corpus(VectorSource(tweets.df$text))
removeURL <- function(x) gsub("http[[:alnum:][:punct:]]*", "", x) 
removeNumPunct <- function(x) gsub('[[:punct:]]', '', x)
myCorpus <- tm_map(myCorpus, content_transformer(removeNumPunct))
myCorpus <- tm_map(myCorpus, content_transformer(removeURL))
myCorpus <- tm_map(myCorpus, ptstem)

The point is that even after the last command myCorpus <- tm_map(myCorpus, ptstem) the text does not appear stemmed.

Any tips? Thank you very much!

Does that help you? https://github.com/dfalbel/ptstem

– Tomás Barcellos

2019/02/25 at 16:51
I’ll try. Thank you!

– user135517

2019/02/26 at 02:10
The function ptstem which is used in tm_map At first it is not defined in the question, nor in the most common libraries for that purpose. Could you indicate which package you removed it from? Or enter the function you programmed...

– Guilherme Parreira

2019/02/27 at 04:11

1 answer

Browser other questions tagged r twitter

You are not signed in. Login or sign up in order to post.

by Guilherme Parreira • **2,060** points · Answer 1 · 2019-02-27T04:10:20+00:00

The main function of stemming maid in tm_map is the stemDocument, but as presented in the answer to that question, it is not possible to use the same for Portuguese due to a bug.

What I did to get around the situation was use the package quanteda:

library(quanteda)
my_dfm <- dfm_wordstem(myCorpus, language = "pt")

Another option would be to adapt if possible the use of the function ptstem::ptstem_words in its context (I have not tested).