How to perform the tapply function for multiple dataframes in R?


Viewed 651 times


I wish to perform only ONE function tapply and get results for each of the dataframes at once:

dataset1<-data.frame(group = rep(c('a','b','c','d'), 3, each = 3),
                     number1 = c(1:36), number2 = c(1:36))

dataset2<-data.frame(group = rep(c('a','b','c','d'), 3, each = 3),
                     number1 = c(36:71), number2 = c(36:71))

dataset3<-data.frame(group = rep(c('a','b','c','d'), 3, each = 3),
                     number1 = c(71:106), number2 = c(71:106))

I ask, if possible, that the resolution be made for the three databases and for the two numerical variables present in each of them.

2 answers


First, it’s best to have all the bases on a list. I’m going to do this with a combination of ls and mget.

ls(pattern = "^dataset\\d+$")
#[1] "dataset1" "dataset2" "dataset3"

df_list <- mget(ls(pattern = "^dataset\\d+$"))

Now just apply a function to all members of the df’s list.

result_list <- lapply(df_list, function(DF){
    s2 <- tapply(DF[[2]], DF[[1]], sum)
    s3 <- tapply(DF[[3]], DF[[1]], sum)
    rbind(s2, s3)

rm(df_list)    # Já não é precisa

#     a   b   c   d
#s2 126 153 180 207
#s3 126 153 180 207
#     a   b   c   d
#s2 441 468 495 522
#s3 441 468 495 522
#     a   b   c   d
#s2 756 783 810 837
#s3 756 783 810 837


One option would be to use repeat loops, this method is not advisable if the BD is too long.


list_df = list(dataset1, dataset2, dataset3)

for (i in 1:length(list_df)){
  df =[[i]])
  for (j in 2:ncol(df)) {
    x = tapply(df[,2], df[,1], sum)
#  a   b   c   d 
#126 153 180 207 
#  a   b   c   d 
#126 153 180 207 
#[1] "----------"
#  a   b   c   d 
#441 468 495 522 
#  a   b   c   d 
#441 468 495 522 
#[1] "----------"
#  a   b   c   d 
#756 783 810 837 
#  a   b   c   d 
#756 783 810 837 
#[1] "----------"

Browser other questions tagged

You are not signed in. Login or sign up in order to post.