Use lapply (replacement for) to leave only columns in common for multiple dataframes in a new list

Asked

Viewed 39 times

2

dput of the list:

structure(list(col1 = structure(list(a = 1:5, b = 1:5, c = 1:5), .Names = c("a", 
"b", "c"), row.names = c(NA, -5L), class = "data.frame"), col2 = structure(list(
    a = 6:10, c = 6:10), .Names = c("a", "c"), row.names = c(NA, 
-5L), class = "data.frame"), col3 = structure(list(a = 11:15, 
    c = 11:15), .Names = c("a", "c"), row.names = c(NA, -5L), class = "data.frame"), 
    col4 = structure(list(a = 16:20, b = 16:20), .Names = c("a", 
    "b"), row.names = c(NA, -5L), class = "data.frame"), col5 = structure(list(
        a = 21:25, c = 21:25), .Names = c("a", "c"), row.names = c(NA, 
    -5L), class = "data.frame")), .Names = c("col1", "col2", 
"col3", "col4", "col5"))

The only common column among them is a.

This is what I tried to do:

newlist<-lapply(1:length(list),function(x)colnames(x))

But it is returned NULL.

I also tried to use merge (with lapply) to aggregate these dataframes (considering all=TRUE,by='row.names',incomparables=NA,sort=FALSE), but without success.

1 answer

1


Use the package plyr to merge the original list into a data frame:

library(plyr)
res <- ldply(dados, data.frame)

res is a data frame with 3 columns: a, b and c. Like b and c are not present in all elements of dados, they own NA. The function select_if of dplyr allows us to select only the columns of res such that all elements are not NA:

library(dplyr)
res <- res %>%
  select_if(~ !any(is.na(.)))

Now just separate the res without NA using the function split. It will separate all columns that are not the column .id on a new list:

split(res[, -1], res$.id)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.