How to do Descending Hierarchical Classification (CHD)/Reinert method in R?

Asked

Viewed 1,095 times

2

In the Iramuteq has a very common analysis, which is the Descending Hierarchical Classification (CHD) or also known for reinert’s method. Here is an example of the same: https://www.youtube.com/watch?v=H9xliY7Zy40

It groups similar texts together, creating new "sections" (similar to factor analysis).

  1. How can I do the same in R?
  2. Has some English name for it?

Follow an example of the same:

The analysis begins with your Textual corpus:

library(quanteda)
dfm1 <- dfm(data_corpus_irishbudget2010)

and from it (I imagine), reproduces the CHD that has the following main results:

inserir a descrição da imagem aqui

It cassifies its copurs/text in different sections (classes), similar to factor analysis factors. So that 30.4% of the text was classified in Class 4, 28.4% was classified in Class 3 (another subject). And they have a practical interpretation, according to the most frequent words that appear in each class, and is presented below the other very common table:

inserir a descrição da imagem aqui

According to the word/attribute that appears in the class, they give name to the class, for example, Class 4 is about nutritional aspects, because the words that appear together basically consist of foods.

  • Does it need to be this hierarchical classification or can it be another method? About Hierarchical Clustering in R, see ?hclust.

  • is that I’m going to apply the technique to data that was collected on a lace table, that is, text. I even thought to calculate some distance from the dfm that is generated in text mining, and then apply the hclust, but it’s not the same. This technique, as I understand it, groups the texts within a section, and then informs which word is associated with the section from the Chi-Square

  • Search improve your question, offering a reproducible example and as desired the expected result.

  • 1

    I changed @Tomásbarcellos, but I don’t know much more than that

  • 1

    OP, I’m not much in this area but I’ve used a text sorting algorithm that does something similar to your question, take a look at this chapter here here.

  • Thank you @Jdemello, I took your advice, and managed to finish the analysis with this type of modeling. By the name of the title of the question I found nothing in the R.

  • Soon you must paint a package. https://juba.github.io/rainette/

Show 2 more comments
No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.