Wrong return when grouping rows from a data frame

Question

Wrong return when grouping rows from a data frame

Asked 5 years, 1 month ago

Viewed 53 times

0

I am working with a dataframe and I aim to group the lines that have the same content in the first column (Course, which are factors). Therefore the following commands:

library(dplyr)

data.test2 %>%
  group_by(Curso) %>%
  summarise(Total_Vagas1 = sum(data.test2$`Vaga 1 Sem`))

With that, my data frame that was more or less like this:

1 ADMINISTRAÇÃO                12  
2 ADMINISTRAÇÃO                45
3 ADMINISTRAÇÃO                86
4 ARTE E MÍDIA                 35
5 ARTE E MÍDIA                 24
6 CIÊNCIAS ECONÔMICAS          55
7 CIÊNCIAS ECONÔMICAS           5
8 CIÊNCIAS ECONÔMICAS         255

Returns like this:

# A tibble: 3 x 2
  Curso               Total_Vagas1
  <fct>                      <int>
1 ADMINISTRAÇÃO                517
2 ARTE E MÍDIA                 517
3 CIÊNCIAS ECONÔMICAS          517

Note that R summed the value of all rows in the column "Vagas 1 Sem" and added this value to each of the lines. What I want, in fact, is the value of administration vacancies, agronomy etc in their respective line.

Data in format dput

data.test2 <-
structure(list(Curso = structure(c(1L, 1L, 1L, 2L, 
2L, 3L, 3L, 3L), .Label = c("ADMINISTRAÇÃO", 
"ARTE E MÍDIA", "CIÊNCIAS ECONÔMICAS"), class = "factor"), 
`Vaga 1 Sem` = c(12L, 45L, 86L, 35L, 24L, 55L, 5L, 255L)), 
row.names = c("1", "2", "3", "4", "5", "6", "7", "8"), 
class = "data.frame")

Hello, hello, Oyo. See in this topic how to produce a minimum reproducible example in R: https://pt.meta.stackoverflow.com/questions/824/como-cria-um-exemplo-m%C3%Adnimo-reproduces%C3%Advel-em-r

– Carlos Eduardo Lagosta

2020/06/14 at 02:05
@Carloseduardolagosta Don’t want to vote to reopen? The question is now reproducible.

– Rui Barradas

2020/06/14 at 10:04
I don’t have that privilege

– Carlos Eduardo Lagosta

2020/06/14 at 15:28

1 answer

Browser other questions tagged r dplyr

You are not signed in. Login or sign up in order to post.

by Carlos Eduardo Lagosta • **5,497** points · Answer 1 · 2020-06-14T02:35:03+00:00

Remove the data.test2$ of sum(..., or dplyr will understand that it is to use the total sum in each group.

library(dplyr)

# Dados de exemplo
set.seed(876)
dados <- tibble(Curso = as.factor(rep(LETTERS[1:4], each = 3)),
                Vagas = sample(20:100, 12))
names(dados)[2] <- "Vaga 1 Sem"

> sum(dados$`Vaga 1 Sem`)
[1] 720

> dados %>% group_by(Curso) %>% summarise(Total_Vagas1 = sum(dados$`Vaga 1 Sem`))
# A tibble: 4 x 2
  Curso Total_Vagas1
  <fct>        <int>
1 A              720
2 B              720
3 C              720
4 D              720

> dados %>% group_by(Curso) %>% summarise(Total_Vagas1 = sum(`Vaga 1 Sem`))
# A tibble: 4 x 2
  Curso Total_Vagas1
  <fct>        <int>
1 A              201
2 B              140
3 C              202
4 D              177