Create frequency distribution matrix in R (categorical variable VS numeric)

Asked

Viewed 558 times

3

I have two columns: supply situation (categorical variable) and number of students (numerical variable). I want to create a table that tells how many students are in each modality (class/levels). This is the goal (example):

Distribuição de Frequência (situação da oferta x nº de alunos) EXEMPLO

I thank you in advance, PS: By helping me in this work, you will help expand innovations like Stack Over Flow in public education (this is my area of study).

  • 1

    What have you done so far? We help fix mistakes...

  • Thank you for your participation, Renaro! I’m sorry, I’m new here and I tried to make the most of my objectivity. Well, here’s one of my attempts: > setwd("C:/R") > library(car) > Uabalunos<- read.csv ("C:/R/UAB.Alunos_por_polos2014.csv", header=TRUE, Dec="." , Sep=";") > UAB<-data.frame(Uabalunos) > situation<-factor<-factor(UAB$no_situacao_oferta_41) > offer<-(UAB$oferta_numero_alunos_cadastrados_42) > sitvsof<-table(UAB$situation, UAB$offer) > sitvsof < table of extent 0 x 0 >

  • write what Voce did along with the answer

1 answer

4


I’m gonna create a data.frame such an example:

library(dplyr)
base <- data.frame(
  situacao = rep(c("a ser concluida", "ativa", "concluida"), length.out = 100),
  qtd_alunos = rep(c(6,7,2,3), length.out = 100)
  )

> head(base)
         situacao qtd_alunos
1 a ser concluida          6
2           ativa          7
3       concluida          2
4 a ser concluida          3
5           ativa          6
6       concluida          7

You can then aggregate the qtd_alunos by the categories of the variable situacao using:

base %>% group_by(situacao) %>% summarise(qtd_alunos = sum(qtd_alunos))

Source: local data frame [3 x 2]

         situacao qtd_alunos
1 a ser concluida        153
2           ativa        151
3       concluida        146

The command group_by indicates which variable you want to use to aggregate and the command summarise indicates how you want to aggregate/summarize the information, in case we use the sum.

If each student is a row of your database, and you do not own the column qtd_alunos, you could use the following code to create a frequency table:

base %>% group_by(situacao) %>% summarise(qtd_alunos = n())

In this specific case using the command n() I want to count the number of lines per category.

Note that to have the functions group_byand summariseyou need to have the dplyr package installed: install.packages("dplyr") and then carry it library(dplyr)

  • I have no reputation for voting in favour, but know that your answer was very good! Congratulations on quality and speed, now I’m a stackoverflow enthusiast. Thank you very much!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.