2
I have this data frame with 275 variables and would like to remove the variables that are not contributing significantly (that have value different from 0 less than 10 times). Can someone help me?
2
I have this data frame with 275 variables and would like to remove the variables that are not contributing significantly (that have value different from 0 less than 10 times). Can someone help me?
1
A possible way to do this is by using the function select_if
package dplyr
.
First set a function that counts the number of zeros:
contar_zeros <- function(x){
sum(x == 0)
}
Now consider this date.frame
df <- data_frame(
x = 0,
y = 1:10,
z = c(rep(0,5), 6:10)
)
df
# A tibble: 10 × 3
x y z
<dbl> <int> <dbl>
1 0 1 0
2 0 2 0
3 0 3 0
4 0 4 0
5 0 5 0
6 0 6 6
7 0 7 7
8 0 8 8
9 0 9 9
10 0 10 10
Using the select_if
:
df_sem_colunas <- select_if(df, function(col) contar_zeros(col) < 10)
df_sem_colunas
# A tibble: 10 × 2
y z
<int> <dbl>
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
Browser other questions tagged r
You are not signed in. Login or sign up in order to post.
See help: http://stackoverflow.com/questions/4605206/drop-data-frame-columns-by-name
– eightShirt
It helped!! Thank you very much!
– Carolina Bury