Degree of class grouping

Asked

Viewed 58 times

2

I have several classes and each of these classes is composed of dozens of students, and each year these students change classes. So I would like to calculate the degree of grouping that a class keeps from one year to the next automatically. For example, in the year 2015 a school has two 1st grade classes, as below:

turma1a <- c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J')   
turma1b <- c('K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U')

And in the year 2016 these students moved to the 2nd grade in the year 2016 going to two new classes randomly, as below:

turma2a <- c('A', 'B', 'C', 'D', 'E', 'F', 'K', 'L', 'M', 'N')  
turma2b <- c('O', 'P', 'Q', 'R', 'S', 'T', 'U', 'G', 'H', 'I', 'J')

In this way, I would like to determine that the turma1a was 60% for the turma2a and that the other was 63.63% for the 2b.

I tried to do by intersection on R, knowing which classes are more similar, but I would need to do it with dozens of classes compared to each other.

   turma1a <- c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J')
   turma1b <- c('K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U')
   turma2a <- c('A', 'B', 'C', 'D', 'E', 'F', 'K', 'L', 'M', 'N')
   turma2b <- c('O', 'P', 'Q', 'R', 'S', 'T', 'U', 'G', 'H', 'I', 'J')
   intersect(turma1a, turma2a)
   [1] "A" "B" "C" "D" "E" "F"

With that script, I meet the students in common, but I would need it to be automatic, because I need to analyze dozens of classes.

1 answer

3


A possible solution is to use a double cycle with lapply to create a list of common students and their proportions.
For this, it is better to have the classes together in lists.

lista_t1 <- list(turma1a, turma1b)
names(lista_t1) <- c("turma1a", "turma1b")
lista_t2 <- list(turma2a, turma2b)
names(lista_t2) <- c("turma2a", "turma2b")

Now we use the cycles lapply on these lists.

resultado <- lapply(lista_t1, function(x)
                lapply(lista_t2, function(y) {
                    int <- intersect(x, y)
                    list(comuns = int, prop = length(int)/length(x))
                })
            )

Of course there must be many other ways to solve this problem. This is just one of them and perhaps the data structure of resultado is not the best. (It is always difficult to work with lists and sub-lists, etc.)

Note:
It is perhaps best to automate some class list creation operations as well. For example, class names can be assigned with

names(lista_t1) <- ls()[grep("turma1", ls())]
names(lista_t2) <- ls()[grep("turma2", ls())]

which avoids having to write all the names.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.