In R, using dplyr, create a new matrix

Asked

Viewed 170 times

3

Suppose I have the following database

 >data
   zona  candidato votos
    1     A         100
    1     B          20
    2     A          30
    2     B          15

I want, using dplry, the following matrix

   >nova

   zona  votos_zona   votosA  votosB
     1      120         100       20
     2      45           30       15

I tried something like this

 nova <- data %>%
                     group_by(zona) %>%
                     summarise(votos_zona= sum(votos), 
                               votosA =      ,
                               votosB =         )

But I can’t complete the code

2 answers

4


Here I think we can use another function of another Hadley package, the tidyr.

require(tidyr)
data %>% spread(candidato, votos)

  zona   A  B
1    1 100 20
2    2  30 15

Note that if you have several candidate names, you will not have to type one by one.

> data <- data.frame(zona = c(1,1,1,1), candidato = c("A", "B", "C", "D"), votos = c(100,20,30,15))
> data
  zona candidato votos
1    1         A   100
2    1         B    20
3    1         C    30
4    1         D    15
> data %>% spread(candidato, votos)
  zona   A  B  C  D
1    1 100 20 30 15

1

You can put the condition inside the sum:

data %>% group_by(zona) %>%summarise(votos_zona = sum(votos),
                                     votosA = sum(votos[candidato=="A"]),
                                     votosB = sum(votos[candidato=="B"]))
Source: local data frame [2 x 4]

  zona votos_zona votosA votosB
1    1        120    100     20
2    2         45     30     15
  • 1

    I liked the solution ... it serves to add other variables in relation to candidates. Thank you

  • yesterday I checked the code and everything worked out... today when I went around again, a one-line matrix appears... it’s as if I didn’t use group_by(zone)... it gave me only the total sum of each variable I asked: votos_zonevotosB 1 165 130 35

  • @Vasco for some reason you carried the dplyr first and then the package plyr in second. When you do this gives error because the function summarise of plyr mask the function of dplyr. Restarts the R and loads only the dplyr or carry first the plyrand then the dplyr (in that order). See this Soen question: http://stackoverflow.com/questions/26106146/using-dplyr-to-summarise-by-group/26106218#26106218

Browser other questions tagged

You are not signed in. Login or sign up in order to post.