How to delete elements from the list through a condition?

Asked

Viewed 92 times

3

I used the following function:

splitfile<-split(training,list(training$group1,training$city))

and this gives me a list of dataframes with different lines, based on the variables I selected.

However, a dataframe with 0 lines is returned, as there is no matching of a category of one variable with the category of another variable.

My goal is to delete this dataframe that contains 0 lines from the list.

But, I need to do this for condition, because, there may be cases that return me several dataframes with 0 lines and, it would not be efficient to eliminate one by one.

dput for assistance in response:

training=structure(list(bin = c(0, 0, 0, 0, 1, 1, 0, 0, 1, 1), modalidade = structure(c(1L, 
3L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("unik", "opfin", 
"compras"), class = "factor"), group1 = c(1, 2, 2, 1, 2, 1, 2, 
1, 2, 1), missing = c(NA, 4, 5, NA, 7, 6, NA, NA, 4, 5), score1 = c(3, 
2, 4, 4, 7, 6, 4, 3, 6, 7), valor = c(100, 200, 321, 34, 3424, 
2344, 4232, 43, 22, 22), gender = c("M", "M", "M", "M", "M", 
"F", "F", "F", "F", "F"), via = structure(c(2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L), .Label = c("1via", "2via"), class = "factor"), 
income = c(1605.52545357496, 1957.10460608825, 3463.77286640927, 
2241.49697413668, 2575.95523679629, 3004.28174249828, 3458.30937661231, 
1786.68619645759, 2065.093211364, 1561.55416276306), city = c("San Francisco", 
"Santa Monica", "Santa Monica", "Santa Monica", "Santa Monica", 
"Hollywood", "Hollywood", "Hollywood", "Hollywood", "Hollywood"
), CPF = c(38676865809, 43245353454, 34565456765, 38676865809, 
38676865809, 44322211189, 44322211189, 12345678900, 12345678900, 
33444455590), desbloq = structure(c(10553, 9537, 10553, 10553, 
9212, 10658, 10957, 11822, 11822, 10188), class = "Date"), 
trans = structure(c(10556, 9541, 10555, 10554, 9218, 10660, 
10958, 11823, 11826, 10190), class = "Date")), row.names = c(NA, 
-10L), .Names = c("bin", "modalidade", "group1", "missing", "score1", 
"valor", "gender", "via", "income", "city", "CPF", "desbloq", 
"trans"), class = "data.frame")

If the answer includes any package that does so elegantly, even better.

1 answer

4


It’s not as difficult as that to eliminate all data frames with zero lines at once. It’s even easier than it looks, just an instruction.

Explained by parties.

First, I’ll use the sapply and not the lapply because the second has a list as output, while the sapply is the same function with simplify = TRUE.
And the NROW because it also functions with objects without a dimension attribute.

sapply(splitfile, NROW)
#    1.Hollywood     2.Hollywood 1.San Francisco 2.San Francisco 
#              3               2               1               0 
# 1.Santa Monica  2.Santa Monica 
#              1               3

Secondly, just turn these results into a logical index:

sapply(splitfile, NROW) != 0
#    1.Hollywood     2.Hollywood 1.San Francisco 2.San Francisco 
#           TRUE            TRUE            TRUE           FALSE 
# 1.Santa Monica  2.Santa Monica 
#           TRUE            TRUE

And use that index to get what you want.

result <- splitfile[sapply(splitfile, NROW) != 0]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.