problem with unnest_tokens()

Asked

Viewed 251 times

3

I am a linguist, I work in text mining and I am trying to import a novel (200 pages of text linearly written with line breaks) for analysis in R. Several errors occur.

First I tried:

library(tidytext)
library(dplyr)
library(readr)
estrela <- readLines("estrela.txt")
estrela.tidy <- estrela %>%
  unnest_tokens(word, text)

Then I tried:

library(tidytext)
library(dplyr)
library(readr)
estrela <- read_file("estrela.txt")
estrela.tidy <- estrela %>%
  unnest_tokens(word, text)

The result I have is in both cases:

Error in UseMethod("unnest_tokens_"): no applicable method for 'unnest_tokens_' applied to an object of class "character"

Am I doing something wrong? Is there any simpler way to import a text to work with tidytext?

Thank you very much!

1 answer

4


You need to translate your text into data.frame:

library(tidytext)
library(dplyr)
library(readr)
estrela <- "texto stackoverflow português"
estrela <- data.frame(text = estrela, stringsAsFactors = F)
estrela.tidy <- estrela %>%
  unnest_tokens(word, text)

Upshot:

             word
1           texto
1.1 stackoverflow
1.2     português

That mistake Error in UseMethod("unnest_tokens_"): no applicable method for 'unnest_tokens_' applied to an object of class "character" means that estrela is class character and this class is not acceptable for the function unnest_tokens. Note that after transfomar estrela in data.frame, we have the following:

estrela <- data.frame(text = estrela, stringsAsFactors = F)

On the console:

> class(estrela)
[1] "data.frame"

In the documentation of the function unnest_tokens (?unnest_tokens on the console):

Arguments

tbl The data frame

  • Thank you very much!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.