How to remove columns from a data frame?

Asked

Viewed 780 times

2

I have this data frame with 275 variables and would like to remove the variables that are not contributing significantly (that have value different from 0 less than 10 times). Can someone help me?

  • 1

    See help: http://stackoverflow.com/questions/4605206/drop-data-frame-columns-by-name

  • 1

    It helped!! Thank you very much!

1 answer

1


A possible way to do this is by using the function select_if package dplyr.

First set a function that counts the number of zeros:

contar_zeros <- function(x){
  sum(x == 0)
}

Now consider this date.frame

df <- data_frame(
  x = 0,
  y = 1:10,
  z = c(rep(0,5), 6:10)
)
df
# A tibble: 10 × 3
       x     y     z
   <dbl> <int> <dbl>
1      0     1     0
2      0     2     0
3      0     3     0
4      0     4     0
5      0     5     0
6      0     6     6
7      0     7     7
8      0     8     8
9      0     9     9
10     0    10    10

Using the select_if:

df_sem_colunas <- select_if(df, function(col) contar_zeros(col) < 10)
df_sem_colunas
# A tibble: 10 × 2
       y     z
   <int> <dbl>
1      1     0
2      2     0
3      3     0
4      4     0
5      5     0
6      6     6
7      7     7
8      8     8
9      9     9
10    10    10

Browser other questions tagged

You are not signed in. Login or sign up in order to post.