Wrong return when grouping rows from a data frame

Asked

Viewed 53 times

0

I am working with a dataframe and I aim to group the lines that have the same content in the first column (Course, which are factors). Therefore the following commands:

library(dplyr)

data.test2 %>%
  group_by(Curso) %>%
  summarise(Total_Vagas1 = sum(data.test2$`Vaga 1 Sem`))

With that, my data frame that was more or less like this:

1 ADMINISTRAÇÃO                12  
2 ADMINISTRAÇÃO                45
3 ADMINISTRAÇÃO                86
4 ARTE E MÍDIA                 35
5 ARTE E MÍDIA                 24
6 CIÊNCIAS ECONÔMICAS          55
7 CIÊNCIAS ECONÔMICAS           5
8 CIÊNCIAS ECONÔMICAS         255

Returns like this:

# A tibble: 3 x 2
  Curso               Total_Vagas1
  <fct>                      <int>
1 ADMINISTRAÇÃO                517
2 ARTE E MÍDIA                 517
3 CIÊNCIAS ECONÔMICAS          517

Note that R summed the value of all rows in the column "Vagas 1 Sem" and added this value to each of the lines. What I want, in fact, is the value of administration vacancies, agronomy etc in their respective line.

Data in format dput

data.test2 <-
structure(list(Curso = structure(c(1L, 1L, 1L, 2L, 
2L, 3L, 3L, 3L), .Label = c("ADMINISTRAÇÃO", 
"ARTE E MÍDIA", "CIÊNCIAS ECONÔMICAS"), class = "factor"), 
`Vaga 1 Sem` = c(12L, 45L, 86L, 35L, 24L, 55L, 5L, 255L)), 
row.names = c("1", "2", "3", "4", "5", "6", "7", "8"), 
class = "data.frame")
  • Hello, hello, Oyo. See in this topic how to produce a minimum reproducible example in R: https://pt.meta.stackoverflow.com/questions/824/como-cria-um-exemplo-m%C3%Adnimo-reproduces%C3%Advel-em-r

  • @Carloseduardolagosta Don’t want to vote to reopen? The question is now reproducible.

  • I don’t have that privilege

1 answer

5


Remove the data.test2$ of sum(..., or dplyr will understand that it is to use the total sum in each group.

library(dplyr)

# Dados de exemplo
set.seed(876)
dados <- tibble(Curso = as.factor(rep(LETTERS[1:4], each = 3)),
                Vagas = sample(20:100, 12))
names(dados)[2] <- "Vaga 1 Sem"

> sum(dados$`Vaga 1 Sem`)
[1] 720

> dados %>% group_by(Curso) %>% summarise(Total_Vagas1 = sum(dados$`Vaga 1 Sem`))
# A tibble: 4 x 2
  Curso Total_Vagas1
  <fct>        <int>
1 A              720
2 B              720
3 C              720
4 D              720

> dados %>% group_by(Curso) %>% summarise(Total_Vagas1 = sum(`Vaga 1 Sem`))
# A tibble: 4 x 2
  Curso Total_Vagas1
  <fct>        <int>
1 A              201
2 B              140
3 C              202
4 D              177

Browser other questions tagged

You are not signed in. Login or sign up in order to post.