In R, using dplyr, create a new matrix


Viewed 170 times


Suppose I have the following database

   zona  candidato votos
    1     A         100
    1     B          20
    2     A          30
    2     B          15

I want, using dplry, the following matrix


   zona  votos_zona   votosA  votosB
     1      120         100       20
     2      45           30       15

I tried something like this

 nova <- data %>%
                     group_by(zona) %>%
                     summarise(votos_zona= sum(votos), 
                               votosA =      ,
                               votosB =         )

But I can’t complete the code

2 answers


Here I think we can use another function of another Hadley package, the tidyr.

data %>% spread(candidato, votos)

  zona   A  B
1    1 100 20
2    2  30 15

Note that if you have several candidate names, you will not have to type one by one.

> data <- data.frame(zona = c(1,1,1,1), candidato = c("A", "B", "C", "D"), votos = c(100,20,30,15))
> data
  zona candidato votos
1    1         A   100
2    1         B    20
3    1         C    30
4    1         D    15
> data %>% spread(candidato, votos)
  zona   A  B  C  D
1    1 100 20 30 15


You can put the condition inside the sum:

data %>% group_by(zona) %>%summarise(votos_zona = sum(votos),
                                     votosA = sum(votos[candidato=="A"]),
                                     votosB = sum(votos[candidato=="B"]))
Source: local data frame [2 x 4]

  zona votos_zona votosA votosB
1    1        120    100     20
2    2         45     30     15
  • 1

    I liked the solution ... it serves to add other variables in relation to candidates. Thank you

  • yesterday I checked the code and everything worked out... today when I went around again, a one-line matrix appears... it’s as if I didn’t use group_by(zone)... it gave me only the total sum of each variable I asked: votos_zonevotosB 1 165 130 35

  • @Vasco for some reason you carried the dplyr first and then the package plyr in second. When you do this gives error because the function summarise of plyr mask the function of dplyr. Restarts the R and loads only the dplyr or carry first the plyrand then the dplyr (in that order). See this Soen question:

Browser other questions tagged

You are not signed in. Login or sign up in order to post.