How to return the most prevalent category associated with a group?

Asked

Viewed 122 times

1

I have a database, in which the variable a is the group variable and b a variable with some categories. My goal is, within each group of a, return what else appears in b.

Consider the dput:

dataset=structure(list(a = c(500, 500, 500, 400, 400, 400, 300, 300, 
300), b = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("a", 
"b"), class = "factor")), class = "data.frame", row.names = c(NA, 
-9L))

Desired result:

  a  b
500  a
400  b
500  a

In addition, it would be useful to return the counts and percentages of this predominance. Something like:

a    b    count    percent
500  a    2        .66 #66%
400  b    2        .66 #66% 
500  a    2        .66 #66% 

1 answer

1


Using the package dplyr:

library(dplyr)
dataset %>% 
  group_by(a, b) %>% 
  summarise(count = n()) %>% 
  mutate(percent = count/sum(count)) %>% 
  filter(count == max(count))

Browser other questions tagged

You are not signed in. Login or sign up in order to post.