1
I need to do an analysis of books in Brazilian Portuguese. To organize a frequency list of words per book I am using the commands:
GS.tidy <- GS %>%
unnest_tokens(word, text)
MM.tidy <- MM %>%
unnest_tokens(word, text)
NS.tidy <- NS %>%
unnest_tokens(word, text)
Sa.tidy <- Sa %>%
unnest_tokens(word, text)
frequencia.guimaraes <- bind_rows(mutate(MM.tidy, livro = "MM"),
mutate(GS.tidy, livro = "GS"),
mutate(NS.tidy, livro = "NS"),
mutate(Sa.tidy, livro = "Sa")) %>%
mutate(word = str_extract(word, "[a-z']+")) %>%
count(livro, word) %>%
group_by(livro)
However I realized that the accented words are disappearing and they would need to stay. Is there any hint?
Thank you very much!
Thank you very much!
– user135517